windows使用命令行启动pyspark报错

安装Hadoop

https://www.cnblogs.com/chevin/p/9090683.html

安装Spark

https://www.cnblogs.com/chevin/p/11064854.html

这里有一个坑,一开始电脑用的是3.8的python,然后命令行启动pyspark一直报错,如下图

windows使用命令行启动pyspark报错

无法正确初始化Spark和SparkContext

Traceback (most recent call last):
File “D:\spark-2.4.5-bin-hadoop2.7\python\pyspark\shell.py”, line 31, in
from pyspark import SparkConf
File “D:\spark-2.4.5-bin-hadoop2.7\python\pyspark_init_.py”, line 51, in
from pyspark.context import SparkContext
File “D:\spark-2.4.5-bin-hadoop2.7\python\pyspark\context.py”, line 31, in
from pyspark import accumulators
File “D:\spark-2.4.5-bin-hadoop2.7\python\pyspark\accumulators.py”, line 97, in
from pyspark.serializers import read_int, PickleSerializer
File “D:\spark-2.4.5-bin-hadoop2.7\python\pyspark\serializers.py”, line 72, in
from pyspark import cloudpickle
File “D:\spark-2.4.5-bin-hadoop2.7\python\pyspark\cloudpickle.py”, line 145, in
_cell_set_template_code = _make_cell_set_template_code()
File “D:\spark-2.4.5-bin-hadoop2.7\python\pyspark\cloudpickle.py”, line 126, in _make_cell_set_template_code
return types.CodeType(
TypeError: an integer is required (got type bytes)

在官网https://www.python.org/downloads/windows/,下了一个3.6版本的python,重新安装,配置环境变量,再启动就正常了

windows使用命令行启动pyspark报错