星火使用Scala：写空般的字段值在卡桑德拉，而不是TupleValue

问题描述：

在我的收藏品之一，让我们说，我有以下字段：星火使用Scala：写空般的字段值在卡桑德拉，而不是TupleValue

f: frozen<tuple<text, set<text>>

比方说，我要插入一个条目，其中这一特定领域是空的，空的，不存在等使用Scala的脚本，在插入之前，我映射条目的领域是这样的：

sRow("fk") = null // or None, or maybe I simply don't specify the field at all

当试图运行火花脚本（从Databricks，星火连接器版本1.6）我收到以下错误：

org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 133.0 failed 1 times, most recent failure: Lost task 6.0 in stage 133.0 (TID 447, localhost): com.datastax.spark.connector.types.TypeConversionException: Cannot convert object null to com.datastax.spark.connector.TupleValue. 
    at com.datastax.spark.connector.types.TypeConverter$$anonfun$convert$1.apply(TypeConverter.scala:47) 
    at com.datastax.spark.connector.types.TypeConverter$$anonfun$convert$1.apply(TypeConverter.scala:43)

当使用None代替null我仍然得到一个错误，虽然是不同的一个：

org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 143.0 failed 1 times, most recent failure: Lost task 2.0 in stage 143.0 (TID 474, localhost): java.lang.IllegalArgumentException: requirement failed: Expected 2 components, instead of 0 
    at scala.Predef$.require(Predef.scala:233) 
    at com.datastax.spark.connector.types.TupleType.newInstance(TupleType.scala:55)

我明白卡珊德拉没有空的确切概念，但我知道有一种方法在向Cassandra插入条目时将值保留出来，正如我从其他环境中完成的那样，例如使用Cassandra的nodejs驱动程序。如何在插入期望的TupleValue或者某种用户定义类型时强制插入类似null的值？

答

对于现代版本的Cassandra，您可以使用“Unbound”功能让它实际跳过空值。这可能对您的用例最好，因为编写null会隐式写入逻辑删除。

见 Treating nulls as Unset

//Setup original data (1, 1, 1) --> (6, 6, 6) 
sc.parallelize(1 to 6).map(x => (x, x, x)).saveToCassandra(ks, "tab1") 

val ignoreNullsWriteConf = WriteConf.fromSparkConf(sc.getConf).copy(ignoreNulls = true) 
//These writes will not delete because we are ignoring nulls 
val optRdd = sc.parallelize(1 to 6) 
    .map(x => (x, None, None)) 
    .saveToCassandra(ks, "tab1", writeConf = ignoreNullsWriteConf) 

val results = sc.cassandraTable[(Int, Int, Int)](ks, "tab1").collect 

results 
/** 
    (1, 1, 1), 
    (2, 2, 2), 
    (3, 3, 3), 
    (4, 4, 4), 
    (5, 5, 5), 
    (6, 6, 6) 
**/

还有很多更细粒度的控制 Full Docs

星火使用Scala：写空般的字段值在卡桑德拉，而不是TupleValue

相关推荐