Hive语法入门练习

创建数据库：create database_name;

查看数据库：show databases;

使用数据库：use database_name;

查看数据表：show tables;

按正则表达式显示表：show tables 's.*';

显示hive函数：show functions;

显示表中有多少分区：show partitions table_name;

显示表结构信息：desc table_name;

创建普通简单表：

create table student(

sid int,

sname string

);

创建外部表：

create external table stu(

sid int comment 'student id',

sname string comment 'student name')

row format delimited fields terminated by '\t'

stored as textfile location '/user/root'

;

先来说下Hive中内部表与外部表的区别：

1）创建表时：创建内部表时，会将数据移动到数据仓库指向的路径；若创建外部表，仅记录数据所在的路径，不对数据的位置做任何改变。

2）删除表时：在删除表的时候，内部表的元数据和数据会被一起删除，而外部表只删除元数据，不删除数据。这样外部表相对来说更加安全些，数据组织也更加灵活，方便共享源数据。

创建分区表：

create table stu1(

sid int,

sname string

)

partitioned by (grade int)

row format delimited fields terminated by '\t';

创建表并创建索引字段ds:

CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);

复制空表：

create table copy_stu like stu;

表添加一列：

alter table stu add columns(grade int);

alter table stu add columns (grade int comment 'student grade');

表删除列：

alter table stu replace columns(

sid int,

sname string

);

重命名表：

alter table stu rename to stu0;

修改列的名字、类型、位置、注释：

alter table stu0 change grade grades double;

#grade为old_column,grades为new_column;

hive导入数据后，不可修改！！！

加载数据文件：(本地)、（指定分区）

load data local inpath '/root/hive_data/stu.txt' overwrite into table stu;

load data local inpath '/root/hive_data/stu1_1.txt' into table stu1 partition(grade = 80);

https://www.cnblogs.com/kouryoushine/p/7801924.html ;分区解释

加载数据文件：(HDFS)

load data inpath '/user/root/stu.txt' into table stu;

将查询结果插入到Hive表：

insert overwrite table stu3 select * from stu2;

将查询结果写入到hdfs文件：

insert overwrite directory '/user/root' select * from stu2;

将查询结果写入本地文件：

hive -e "select * from test.stu2" > test_stu2.txt

insert overwrite local directory '/root/hive_data' select * from stu2;

将一个表的统计结果插入另一个表中：（两种方式）

from stu2 a insert overwrite table stu_count select a.sid;

insert overwrite table stu_count select count(*) from stu2;

order by,sort by, distribute by, cluster by 区别

https://blog.****.net/jthink_/article/details/38903775

1. order by

Hive中的order by跟传统的sql语言中的order by作用是一样的，会对查询的结果做一次全局排序，所以说，只有hive的sql中制定了order by所有的数据都会到同一个reducer进行处理（不管有多少map，也不管文件有多少的block只会启动一个reducer）。但是对于大量数据这将会消耗很长的时间去执行。

这里跟传统的sql还有一点区别：如果指定了hive.mapred.mode=strict（默认值是nonstrict）,这时就必须指定limit来限制输出条数，原因是：所有的数据都会在同一个reducer端进行，数据量大的情况下可能不能出结果，那么在这样的严格模式下，必须指定输出的条数。

2. sort by

Hive中指定了sort by，那么在每个reducer端都会做排序，也就是说保证了局部有序（每个reducer出来的数据是有序的，但是不能保证所有的数据是有序的，除非只有一个reducer），好处是：执行了局部排序之后可以为接下去的全局排序提高不少的效率（其实就是做一次归并排序就可以做到全局排序了）。

Hive语法入门练习

可参考以下链接：

https://www.cnblogs.com/HondaHsu/p/4346354.html

（待继续更新~）

相关推荐