Spark SQL 1

Spark SQL  1.0
Shark   


Spark SQL  ==> Spark
Hive on Spark  ==> Hive


SparkSQL on Hive  ........




Spark SQL is not about SQL
Spark SQL is about more than SQL


Spark SQL is Apache Spark's module for working with structured data.






MetaStore Datacenter key *****


Data Source API
















RDD


SchemaRDD
DataFrame  Dataset


R Pandas: single node
===> Spark   :   用法类似




JD面试题:Spark主要特性在哪个版本引入


A Dataset is a distributed collection of data
A DataFrame is a Dataset organized into named columns.
DataFrame = Table(column_name, column_type, column_value)
DataFrame = Dataset[Row]




DataFrame vs RDD
Spark SQL 1

DataFrame ===> 性能 一定是一样
RDD:Scala  Python  有很大差距

Spark SQL 1

Hive AST : 抽象语法数

Spark SQL Story
Write less code
Read less data
Let the optimizer do the hard work








explain extended
select b1.key * (2+3), b2.value
from b b1 join b b2 
on b1.key = b2.key and b1.key>10;


select age,count(1) from xx group by age;


Temp: application