Spark SQL 1
Spark SQL 1.0
Shark
Spark SQL ==> Spark
Hive on Spark ==> Hive
SparkSQL on Hive ........
Spark SQL is not about SQL
Spark SQL is about more than SQL
Spark SQL is Apache Spark's module for working with structured data.
MetaStore Datacenter key *****
Data Source API
RDD
SchemaRDD
DataFrame Dataset
R Pandas: single node
===> Spark : 用法类似
JD面试题:Spark主要特性在哪个版本引入
A Dataset is a distributed collection of data
A DataFrame is a Dataset organized into named columns.
DataFrame = Table(column_name, column_type, column_value)
DataFrame = Dataset[Row]
DataFrame vs RDD
DataFrame ===> 性能 一定是一样
RDD:Scala Python 有很大差距
Write less code
Read less data
Let the optimizer do the hard work
explain extended
select b1.key * (2+3), b2.value
from b b1 join b b2
on b1.key = b2.key and b1.key>10;
select age,count(1) from xx group by age;
Temp: application
Shark
Spark SQL ==> Spark
Hive on Spark ==> Hive
SparkSQL on Hive ........
Spark SQL is not about SQL
Spark SQL is about more than SQL
Spark SQL is Apache Spark's module for working with structured data.
MetaStore Datacenter key *****
Data Source API
RDD
SchemaRDD
DataFrame Dataset
R Pandas: single node
===> Spark : 用法类似
JD面试题:Spark主要特性在哪个版本引入
A Dataset is a distributed collection of data
A DataFrame is a Dataset organized into named columns.
DataFrame = Table(column_name, column_type, column_value)
DataFrame = Dataset[Row]
DataFrame vs RDD
DataFrame ===> 性能 一定是一样
RDD:Scala Python 有很大差距
Hive AST : 抽象语法数
Spark SQL StoryWrite less code
Read less data
Let the optimizer do the hard work
explain extended
select b1.key * (2+3), b2.value
from b b1 join b b2
on b1.key = b2.key and b1.key>10;
select age,count(1) from xx group by age;
Temp: application