异常：Some of types cannot be determined by the first 100 rows, please try again with sampling

将RDD转为DataFrame的方式有：

1. 将RDD转换为Row，之后创建dataframe

rdd = stringCSVRDD.map(lambda p: Row(id=p[0], name=p[1], age=p[2], eyeColor=p[3]))
df = spark.createDataFrame(rdd)

通过该方式创建dataframe，书写简单，字段类型通过前100条数据类型进行自动推断。

若字段类型不能推断出，则会报异常：

ValueError: Some of types cannot be determined by the first 100 rows, please try again with sampling

异常：Some of types cannot be determined by the first 100 rows, please try again with sampling

此时，需使用第二种方式进行dataframe的创建，指定字段的类型

2. 通过定义schema形式

schema = StructType(
[StructField('id', LongType(), True), StructField('name', StringType(), True), StructField('age', LongType(), True),
StructField('eyeColor', StringType(), True)])
# Apply the schema to the RDD and Create DataFrame
df = spark.createDataFrame(stringCSVRDD, schema)

异常：Some of types cannot be determined by the first 100 rows, please try again with sampling

相关推荐