spark-shell && spark-submit

在spark bin目录下有 spark-shell和spark-submit 两个脚本,通过 --help 看出来,基本参数都是一样的,那么这两个脚本有什么联系呢?

我们运行spark-shell 的时候,我们在web ui 界面中可以看出来应用程序名为 Spark shell ,带着诸多疑惑,我们来看下 spark-shell和spark-submit 两个脚本的具体内容

spark-shell

cygwin=false
case "$(uname)" in
  CYGWIN*) cygwin=true;;
esac

# Enter posix mode for bash
set -o posix

if [ -z "${SPARK_HOME}" ]; then
  source "$(dirname "$0")"/find-spark-home
fi

export _SPARK_CMD_USAGE="Usage: ./bin/spark-shell [options]
Scala REPL options:
  -I <file>                   preload <file>, enforcing line-by-line interpretation"

SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true"
function main() {
  if $cygwin; then
    stty -icanon min 1 -echo > /dev/null 2>&1
    export SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Djline.terminal=unix"
    "${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "[email protected]"
    stty icanon echo > /dev/null 2>&1
  else
    export SPARK_SUBMIT_OPTS
    "${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "[email protected]"
  fi
}
# Copy restore-TTY-on-exit functions from Scala script so spark-shell exits properly even in
# binary distribution of Spark where Scala is not installed
exit_status=127
saved_stty=""
function restoreSttySettings() {
  stty $saved_stty
  saved_stty=""
}
function onExit() {
  if [[ "$saved_stty" != "" ]]; then
    restoreSttySettings
  fi
  exit $exit_status
}
# to reenable echo if we are interrupted before completing.
trap onExit INT
# save terminal settings
saved_stty=$(stty -g 2>/dev/null)
# clear on error so we don't later try to restore them
if [[ ! $? ]]; then
  saved_stty=""
fi
main "[email protected]"
# record the exit status lest it be overwritten:
# then reenable echo and propagate the code.
exit_status=$?
onExit

我们直接看重点的地方

spark-shell && spark-submit

这里可以看我截图内容,在 spark-shel 的main 方法中执行了 spark-submit 的脚本并且制定--name 为 Spark-shell ,
这就解释了,为什么运行 spark-shell 的时候,在web ui中显示的名字是spark-shell ,其次我们看到指定了主类是
org.apache.spark.repl.Main ,这里其实可以猜到,必定为我们创建sparkcontext 对象,因为我们在执行spark-shell 的时候,会自动为我们创建好 sparkcontext 对象

spark-submit

if [ -z "${SPARK_HOME}" ]; then
  source "$(dirname "$0")"/find-spark-home
fi

# disable randomized hash for string in Python 3.3+
export PYTHONHASHSEED=0

exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "[email protected]"

spark-submit 比较简单就是去执行 spark-class 脚本制定主类是 org.apache.spark.deploy.SparkSubmit ,把接受的参数都传到主类中

spark-class

if [ -z "${SPARK_HOME}" ]; then
  source "$(dirname "$0")"/find-spark-home
fi

. "${SPARK_HOME}"/bin/load-spark-env.sh

# Find the java binary
if [ -n "${JAVA_HOME}" ]; then
  RUNNER="${JAVA_HOME}/bin/java"
else
  if [ "$(command -v java)" ]; then
    RUNNER="java"
  else
    echo "JAVA_HOME is not set" >&2
    exit 1
  fi
fi

# Find Spark jars.
if [ -d "${SPARK_HOME}/jars" ]; then
  SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
  SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

if [ ! -d "$SPARK_JARS_DIR" ] && [ -z "$SPARK_TESTING$SPARK_SQL_TESTING" ]; then
  echo "Failed to find Spark jars directory ($SPARK_JARS_DIR)." 1>&2
  echo "You need to build Spark with the target \"package\" before running this program." 1>&2
  exit 1
else
  LAUNCH_CLASSPATH="$SPARK_JARS_DIR/*"
fi
 
if [ -n "$SPARK_PREPEND_CLASSES" ]; then
  LAUNCH_CLASSPATH="${SPARK_HOME}/launcher/target/scala-$SPARK_SCALA_VERSION/classes:$LAUNCH_CLASSPATH"
fi

# For tests
if [[ -n "$SPARK_TESTING" ]]; then
  unset YARN_CONF_DIR
  unset HADOOP_CONF_DIR
fi

 
build_command() {
  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "[email protected]"
  printf "%d\0" $?
}

 et +o posix
CMD=()
while IFS= read -d '' -r ARG; do
  CMD+=("$ARG")
done < <(build_command "[email protected]")

COUNT=${#CMD[@]}
LAST=$((COUNT - 1))
LAUNCHER_EXIT_CODE=${CMD[$LAST]}

 if ! [[ $LAUNCHER_EXIT_CODE =~ ^[0-9]+$ ]]; then
  echo "${CMD[@]}" | head -n-1 1>&2
  exit 1
fi

if [ $LAUNCHER_EXIT_CODE != 0 ]; then
  exit $LAUNCHER_EXIT_CODE
fi

CMD=("${CMD[@]:0:$LAST}")
exec "${CMD[@]}"

这部分主要是检查 spark 和java 的环境
spark-shell && spark-submit

这部分主要是找 spark 的jar 包,比如说在 spark-shell 和 spark-submit 两个脚本中,分别制定了不同的类,这些类
都应该在 spark的jar包中
spark-shell && spark-submit

spark-shell && spark-submit

spark-shell

spark-submit

spark-class

相关推荐