Spark SQL 1

Spark SQL 1.0
Shark

Spark SQL ==> Spark
Hive on Spark ==> Hive

SparkSQL on Hive ........

Spark SQL is not about SQL
Spark SQL is about more than SQL

Spark SQL is Apache Spark's module for working with structured data.

MetaStore Datacenter key *****

Data Source API

RDD

SchemaRDD
DataFrame Dataset

R Pandas： single node
===> Spark : 用法类似

JD面试题：Spark主要特性在哪个版本引入

A Dataset is a distributed collection of data
A DataFrame is a Dataset organized into named columns.
DataFrame = Table(column_name, column_type, column_value)
DataFrame = Dataset[Row]

DataFrame vs RDD

DataFrame ===> 性能一定是一样
RDD：Scala Python 有很大差距

Spark SQL 1

Hive AST : 抽象语法数

Spark SQL Story
Write less code
Read less data
Let the optimizer do the hard work

explain extended
select b1.key * (2+3), b2.value
from b b1 join b b2
on b1.key = b2.key and b1.key>10;

select age,count(1) from xx group by age;

Temp: application

相关推荐