错误而出口火花SQL数据帧到csv
我已经提到,为了下面的链接了解如何导出火花SQL数据帧在Python错误而出口火花SQL数据帧到csv
我的代码:
我使用spark-submit pas加载作业唱主网址下列罐子
我收到以下错误
df.select('Consigner', 'AverageScore', 'Trips').write.format('com.databricks.spark.csv').options(header='true').save('file:///opt/BIG-DATA/VisualCargo/output/top_consigner.csv')
File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 332, in save
File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 36, in deco
File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o156.save.
py4j.protocol.Py4JJavaError: An error occurred while calling o156.save.
: java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;
at com.databricks.spark.csv.util.CompressionCodecs$.<init>(CompressionCodecs.scala:29)
at com.databricks.spark.csv.util.CompressionCodecs$.<clinit>(CompressionCodecs.scala)
at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:198)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:170)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:146)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
星火版1.5.0-cdh5.5.1
是建立与斯卡拉2.10 - 默认斯卡拉版星火< 2.0。你的火花CSV是建立与斯卡拉2.10 - 火花csv_ 2.11 -1.5.0.jar。
请更新spark-csv到Scala 2.10的版本或更新Spark到Scala 2.11。你会知道Scala的版本由数的artifactId后,即火花csv_2.10-1.5.0将成为斯卡拉2.10
火花版本:1.5.0版本,cdh5.5.1 –
@Hardik是的,所以这是斯卡拉冲突。请更新(降级)spark-csv版本到2.10版本 - http://search.maven.org/#artifactdetails%7Ccom.databricks%7Cspark-csv_2.11%7C1.5.0%7Cjar –
先生,非常感谢你,降级我spark-csv jar 2.10工程..但现在它创建文件夹中的多个分区,有没有办法来控制这个,试图write.repartition(1).format(“com.databricks.spark.csv”)..但是抛出错误 –
看起来像一个罐子冲突给我。可能是CSV作家的一些依赖。 – LiMuBei
@LiMuBei斯卡拉版本冲突 –