是否支持spark-streaming-kafka-0-10 lib?
问题描述:
我的kafka集群版本是0.10.0.0,我想用pyspark流来读取kafka数据。但在Spark Streaming + Kafka集成指南中,http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html 没有python代码示例。 因此pyspark可以使用spark-streaming-kafka-0-10来整合kafka吗?是否支持spark-streaming-kafka-0-10 lib?
非常感谢您的帮助!
答
我还使用带有Kafka 0.10.0群集的Spark流。在您的代码中添加以下行后,您可以轻松完成。
spark.jars.packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.0
这里Python中的示例:
# Initialize SparkContext
sc = SparkContext(appName="sampleKafka")
# Initialize spark stream context
batchInterval = 10
ssc = StreamingContext(sc, batchInterval)
# Set kafka topic
topic = {"myTopic": 1}
# Set application groupId
groupId = "myTopic"
# Set zookeeper parameter
zkQuorum = "zookeeperhostname:2181"
# Create Kafka stream
kafkaStream = KafkaUtils.createStream(ssc, zkQuorum, groupId, topic)
#Do as you wish with your stream
# Start stream
ssc.start()
ssc.awaitTermination()