Hive案例(四)
案例:python和R语言,谁更适合用于大数据Spark/hadoop和深度学习?
问题一:大数据spark/hadoop,python和R语言,那个用的人多
准备数据:截图如下,具体资源在本人的上传资源上,大家可以下载
下面来实现上述的问题
#创建数据库
CREATE DATABASE db_language |
#创建表
CREATE TABLE db_language.tb_language_account( id_number string, area string, python string, r string, sql_str string, rapidminer string, excel string, spark string, mangshe string, tensorflow string, scikit_learn string, tableau string, knime string, deep string, spark_hadoop string, ntools int, votetools string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY "\n"
|
#导入数据
LOAD DATA LOCAL INPATH '/opt/data/sw17-top11-dl-sh.anon.csv' INTO TABLE db_language.tb_language_account |
#大数据spark/hadoop,使用python有多少人(683)
#count(),sum(),avg(),max().....
SELECT count(*) as count FROM db_language.tb_language_account WHERE python="1" AND spark_hadoop="1"; |
|
|
#大数据spark/hadoop,使用R语言有多少人
SELECT count(*) as count FROM db_language.tb_language_account WHERE R="1" AND spark_hadoop="1"; |
|
|
#合并结果:
#count 683 606
SELECT t1.p_c,t2.r_c FROM (SELECT count(*) as p_c, "1" as id FROM db_language.tb_language_account WHERE python="1" AND spark_hadoop="1" )t1 JOIN( SELECT count(*) as r_c,"1" as id FROM db_language.tb_language_account WHERE R="1" AND spark_hadoop="1" )t2 on t1.id=t2.id |
|
#注解
我的数据资源是放在/opt/data/sw17-top11-dl-sh.anon.csv,以上操作都是在hive中进行的。
#注意
在hive上执行的sql语句别忘了后面的 “;”切记切记