异常:Some of types cannot be determined by the first 100 rows, please try again with sampling


1. 将RDD转换为Row,之后创建dataframe

rdd = stringCSVRDD.map(lambda p: Row(id=p[0], name=p[1], age=p[2], eyeColor=p[3]))
df = spark.createDataFrame(rdd)



ValueError: Some of types cannot be determined by the first 100 rows, please try again with sampling

异常:Some of types cannot be determined by the first 100 rows, please try again with sampling


2. 通过定义schema形式

schema = StructType(
    [StructField('id', LongType(), True), StructField('name', StringType(), True), StructField('age', LongType(), True),
     StructField('eyeColor', StringType(), True)])
# Apply the schema to the RDD and Create DataFrame
df = spark.createDataFrame(stringCSVRDD, schema)