Postgresql UUID []给Cassandra:转换错误

问题描述:

它给了我java.lang.ClassCastException:[Ljava.util.UUID;不能转换为[Ljava.lang.String;Postgresql UUID []给Cassandra:转换错误

我的作业从包含user_ids uuid[]类型列的PostgreSQL表中读取数据,以便在我试图在Cassandra上保存数据时收到上述错误。

但是,在Cassandra上创建这个同样的表格工作正常! user_ids list<text>

我无法更改源表上的类型,因为我正在从旧系统读取数据。

我一直在寻找印在日志点,阶级org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.scala

case StringType => 
     (array: Object) => 
     array.asInstanceOf[Array[java.lang.String]] 
      .map(UTF8String.fromString)``` 

Debug view


堆栈跟踪

Caused by: java.lang.ClassCastException: [Ljava.util.UUID; cannot be cast to [Ljava.lang.String; 
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$14.apply(JdbcUtils.scala:443) 
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$14.apply(JdbcUtils.scala:442) 
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$13$$anonfun$18.apply(JdbcUtils.scala:472) 
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$13$$anonfun$18.apply(JdbcUtils.scala:472) 
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$nullSafeConvert(JdbcUtils.scala:482) 
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$13.apply(JdbcUtils.scala:470) 
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$13.apply(JdbcUtils.scala:469) 
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:330) 
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:312) 
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) 
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) 
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) 
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) 
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) 
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) 
at org.apache.spark.sql.execution.columnar.InMemoryRelation$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:133) 
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215) 
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1038) 
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029) 
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969) 
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029) 
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760) 
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) 
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) 
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) 
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) 
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) 
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) 
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) 
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) 
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) 
at org.apache.spark.scheduler.Task.run(Task.scala:108) 
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:748) 
+0

可以尝试抛出数组 - > array.asInstanceOf [Array [UUID]],然后尝试将此新数组转换为字符串,即newArray.map(_。toString) –

请看CQL这里的数据类型支持。

您应该在表架构中创建了list<uuid>而不是list<text>。 Java驱动程序无法自动处理此转换。

如果您想要使用text,请在将它发送到驱动程序之前将其转换为字符串。

+0

谢谢!我会这样做,但我试图只使用一个作业来完成这种转换,并且该类型是由驱动程序自动创建的。这很可悲,但我必须为这张桌子写一份具体的工作。再次感谢。 –

您在数据库中存储user_id的值是UUID类型,java中的相同类型的类型为java.util.UUID。

所以不是使用java.lang.String,而应该使用java.util.UUID数组或列表,然后存储在cassandra uuid_obj.toString()中以存储在Cassandra中。