解决spark-shell中java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
在运行Schema Merging示例代码的时候遇到问题以及解决:
1.示例代码:
// This is used to implicitly convert an RDD to a DataFrame.
import spark.implicits._
// Create a simple DataFrame, store into a partition directory
val squaresDF = spark.sparkContext.makeRDD(1 to 5).map(i => (i, i * i)).toDF("value", "square")
squaresDF.write.parquet("data/test_table/key=1")
// Create another DataFrame in a new partition directory,
// adding a new column and dropping an existing column
val cubesDF = spark.sparkContext.makeRDD(6 to 10).map(i => (i, i * i * i)).toDF("value", "cube")
cubesDF.write.parquet("data/test_table/key=2")
// Read the partitioned table
val mergedDF = spark.read.option("mergeSchema", "true").parquet("data/test_table")
mergedDF.printSchema()
// The final schema consists of all 3 columns in the Parquet files together
// with the partitioning column appeared in the partition directory paths
// root
// |-- value: int (nullable = true)
// |-- square: int (nullable = true)
// |-- cube: int (nullable = true)
// |-- key: int (nullable = true)
2.问题:
3.在网上百度了一下,别人说 是spark的默认jar包中没有相应的jar包,即没有snappy-java,由此得到启示,我就检查视{SPARK_HOME//jars下确实没有,因为我在电话上已经装了很多软件了,于是就搜了一下,果然有很多
4.于是我就选择了一个最新版本的snappy-java.x.x.x.jar复制到了{SPARK_HOME}/jars下,记住需要每一个节点上都要复制
5.复制完成后重启spark运用,启动spark-shell,再次运行示例
这样就解决问题了,这种方式就可以一劳永逸的解决问题了