HIVE内关联顺序影响返回结果?求助。。。

@求助!HIVE内关联问题

HIVE内关联顺序影响返回结果?求助。。。

执行代码

1.idl_dim_ssj_user_device_pool 作为主表

select count(pool.udid) from dt.idl_dim_ssj_user_device_pool pool
inner join dt.idl_dim_ssj_user usr 
on pool.fname=usr.fname
;

1.1 执行计划如下:

| STAGE DEPENDENCIES:                                                                                           |
|   Stage-6 is a root stage , consists of Stage-1                                                               |
|   Stage-1                                                                                                     |
|   Stage-2 depends on stages: Stage-1                                                                          |
|   Stage-0 depends on stages: Stage-2                                                                          |
|                                                                                                               |
| STAGE PLANS:                                                                                                  |
|   Stage: Stage-6                                                                                              |
|     Conditional Operator                                                                                      |
|                                                                                                               |
|   Stage: Stage-1                                                                                              |
|     Map Reduce                                                                                                |
|       Map Operator Tree:                                                                                      |
|           TableScan                                                                                           |
|             alias: pool                                                                                       |
|             Statistics: Num rows: 76926919 Data size: 769269190 Basic stats: COMPLETE Column stats: NONE      |
|             Filter Operator                                                                                   |
|               predicate: fname is not null (type: boolean)                                                    |
|               Statistics: Num rows: 38463460 Data size: 384634600 Basic stats: COMPLETE Column stats: NONE    |
|               Reduce Output Operator                                                                          |
|                 key expressions: fname (type: string)                                                         |
|                 sort order: +                                                                                 |
|                 Map-reduce partition columns: fname (type: string)                                            |
|                 Statistics: Num rows: 38463460 Data size: 384634600 Basic stats: COMPLETE Column stats: NONE  |
|                 value expressions: udid (type: string)                                                        |
|           TableScan                                                                                           |
|             alias: usr                                                                                        |
|             Statistics: Num rows: 31374455 Data size: 470616825 Basic stats: COMPLETE Column stats: NONE      |
|             Filter Operator                                                                                   |
|               predicate: fname is not null (type: boolean)                                                    |
|               Statistics: Num rows: 15687228 Data size: 235308419 Basic stats: COMPLETE Column stats: NONE    |
|               Reduce Output Operator                                                                          |
|                 key expressions: fname (type: string)                                                         |
|                 sort order: +                                                                                 |
|                 Map-reduce partition columns: fname (type: string)                                            |
|                 Statistics: Num rows: 15687228 Data size: 235308419 Basic stats: COMPLETE Column stats: NONE  |
|       Reduce Operator Tree:                                                                                   |
|         Join Operator                                                                                         |
|           condition map:                                                                                      |
|                Inner Join 0 to 1                                                                              |
|           keys:                                                                                               |
|             0 fname (type: string)                                                                            |
|             1 fname (type: string)                                                                            |
|           outputColumnNames: _col0                                                                            |
|           Statistics: Num rows: 42309806 Data size: 423098069 Basic stats: COMPLETE Column stats: NONE        |
|           Group By Operator                                                                                   |
|             aggregations: count(_col0)                                                                        |
|             mode: hash                                                                                        |
|             outputColumnNames: _col0                                                                          |
|             Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE                     |
|             File Output Operator                                                                              |
|               compressed: true                                                                                |
|               table:                                                                                          |
|                   input format: org.apache.hadoop.mapred.SequenceFileInputFormat                              |
|                   output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                    |
|                   serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe                             |
|                                                                                                               |
|   Stage: Stage-2                                                                                              |
|     Map Reduce                                                                                                |
|       Map Operator Tree:                                                                                      |
|           TableScan                                                                                           |
|             Reduce Output Operator                                                                            |
|               sort order:                                                                                     |
|               Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE                   |
|               value expressions: _col0 (type: bigint)                                                         |
|       Reduce Operator Tree:                                                                                   |
|         Group By Operator                                                                                     |
|           aggregations: count(VALUE._col0)                                                                    |
|           mode: mergepartial                                                                                  |
|           outputColumnNames: _col0                                                                            |
|           Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE                       |
|           File Output Operator                                                                                |
|             compressed: false                                                                                 |
|             Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE                     |
|             table:                                                                                            |
|                 input format: org.apache.hadoop.mapred.TextInputFormat                                        |
|                 output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                     |
|                 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                                     |
|                                                                                                               |
|   Stage: Stage-0                                                                                              |
|     Fetch Operator                                                                                            |
|       limit: -1                                                                                               |
|       Processor Tree:                                                                                         |
|         ListSink                                                                                              |
|                                                                                                               

1.2 执行日志和结果:

INFO  : Compiling command(queryId=hive_20190111095656_9cf4a75d-2b3c-4f83-8b0b-01e7cb00b392): select count(pool.udid) from dt.idl_dim_ssj_user_device_pool pool
inner join dt.idl_dim_ssj_user usr 
on pool.fname=usr.fname
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20190111095656_9cf4a75d-2b3c-4f83-8b0b-01e7cb00b392); Time taken: 0.336 seconds
INFO  : Executing command(queryId=hive_20190111095656_9cf4a75d-2b3c-4f83-8b0b-01e7cb00b392): select count(pool.udid) from dt.idl_dim_ssj_user_device_pool pool
inner join dt.idl_dim_ssj_user usr 
on pool.fname=usr.fname
INFO  : Query ID = hive_20190111095656_9cf4a75d-2b3c-4f83-8b0b-01e7cb00b392
INFO  : Total jobs = 2
INFO  : Starting task [Stage-6:CONDITIONAL] in serial mode
INFO  : Stage-1 is selected by condition resolver.
INFO  : Launching Job 1 out of 2
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks not specified. Estimated from input data size: 51
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:75
INFO  : Submitting tokens for job: job_1541645032015_2447580
......
INFO  : MapReduce Total cumulative CPU time: 47 seconds 580 msec
INFO  : Ended Job = job_1541645032015_2447599
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1: Map: 75  Reduce: 51   Cumulative CPU: 1180.49 sec   HDFS Read: 1160032550 HDFS Write: 10965 SUCCESS
INFO  : Stage-Stage-2: Map: 34  Reduce: 1   Cumulative CPU: 47.58 sec   HDFS Read: 86838 HDFS Write: 2 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 20 minutes 28 seconds 70 msec
INFO  : Completed executing command(queryId=hive_20190111095656_9cf4a75d-2b3c-4f83-8b0b-01e7cb00b392); Time taken: 121.192 seconds
INFO  : OK
+------+--+
| _c0  |
+------+--+
| 0    |
+------+--+
1 row selected (121.668 seconds)

2.idl_dim_ssj_user作为主表

select count(pool.udid) from dt.idl_dim_ssj_user usr
inner join dt.idl_dim_ssj_user_device_pool pool 
on usr.fname=pool.fname
;

2.1 执行计划:

| STAGE DEPENDENCIES:                                                                                           |
|   Stage-6 is a root stage , consists of Stage-1                                                               |
|   Stage-1                                                                                                     |
|   Stage-2 depends on stages: Stage-1                                                                          |
|   Stage-0 depends on stages: Stage-2                                                                          |
|                                                                                                               |
| STAGE PLANS:                                                                                                  |
|   Stage: Stage-6                                                                                              |
|     Conditional Operator                                                                                      |
|                                                                                                               |
|   Stage: Stage-1                                                                                              |
|     Map Reduce                                                                                                |
|       Map Operator Tree:                                                                                      |
|           TableScan                                                                                           |
|             alias: usr                                                                                        |
|             Statistics: Num rows: 31374455 Data size: 470616825 Basic stats: COMPLETE Column stats: NONE      |
|             Filter Operator                                                                                   |
|               predicate: fname is not null (type: boolean)                                                    |
|               Statistics: Num rows: 15687228 Data size: 235308419 Basic stats: COMPLETE Column stats: NONE    |
|               Reduce Output Operator                                                                          |
|                 key expressions: fname (type: string)                                                         |
|                 sort order: +                                                                                 |
|                 Map-reduce partition columns: fname (type: string)                                            |
|                 Statistics: Num rows: 15687228 Data size: 235308419 Basic stats: COMPLETE Column stats: NONE  |
|           TableScan                                                                                           |
|             alias: pool                                                                                       |
|             Statistics: Num rows: 76926919 Data size: 769269190 Basic stats: COMPLETE Column stats: NONE      |
|             Filter Operator                                                                                   |
|               predicate: fname is not null (type: boolean)                                                    |
|               Statistics: Num rows: 38463460 Data size: 384634600 Basic stats: COMPLETE Column stats: NONE    |
|               Reduce Output Operator                                                                          |
|                 key expressions: fname (type: string)                                                         |
|                 sort order: +                                                                                 |
|                 Map-reduce partition columns: fname (type: string)                                            |
|                 Statistics: Num rows: 38463460 Data size: 384634600 Basic stats: COMPLETE Column stats: NONE  |
|                 value expressions: udid (type: string)                                                        |
|       Reduce Operator Tree:                                                                                   |
|         Join Operator                                                                                         |
|           condition map:                                                                                      |
|                Inner Join 0 to 1                                                                              |
|           keys:                                                                                               |
|             0 fname (type: string)                                                                            |
|             1 fname (type: string)                                                                            |
|           outputColumnNames: _col18                                                                           |
|           Statistics: Num rows: 42309806 Data size: 423098069 Basic stats: COMPLETE Column stats: NONE        |
|           Group By Operator                                                                                   |
|             aggregations: count(_col18)                                                                       |
|             mode: hash                                                                                        |
|             outputColumnNames: _col0                                                                          |
|             Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE                     |
|             File Output Operator                                                                              |
|               compressed: true                                                                                |
|               table:                                                                                          |
|                   input format: org.apache.hadoop.mapred.SequenceFileInputFormat                              |
|                   output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                    |
|                   serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe                             |
|                                                                                                               |
|   Stage: Stage-2                                                                                              |
|     Map Reduce                                                                                                |
|       Map Operator Tree:                                                                                      |
|           TableScan                                                                                           |
|             Reduce Output Operator                                                                            |
|               sort order:                                                                                     |
|               Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE                   |
|               value expressions: _col0 (type: bigint)                                                         |
|       Reduce Operator Tree:                                                                                   |
|         Group By Operator                                                                                     |
|           aggregations: count(VALUE._col0)                                                                    |
|           mode: mergepartial                                                                                  |
|           outputColumnNames: _col0                                                                            |
|           Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE                       |
|           File Output Operator                                                                                |
|             compressed: false                                                                                 |
|             Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE                     |
|             table:                                                                                            |
|                 input format: org.apache.hadoop.mapred.TextInputFormat                                        |
|                 output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                     |
|                 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                                     |
|                                                                                                               |
|   Stage: Stage-0                                                                                              |
|     Fetch Operator                                                                                            |
|       limit: -1                                                                                               |
|       Processor Tree:                                                                                         |
|         ListSink                                                                                              |

2.2 执行日志和结果:

INFO  : Compiling command(queryId=hive_20190111142020_c6710c56-cd4d-4919-8cea-b06ed2218502): select count(pool.udid) from dt.idl_dim_ssj_user usr
inner join dt.idl_dim_ssj_user_device_pool pool 
on usr.fname=pool.fname
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20190111142020_c6710c56-cd4d-4919-8cea-b06ed2218502); Time taken: 1.286 seconds
INFO  : Executing command(queryId=hive_20190111142020_c6710c56-cd4d-4919-8cea-b06ed2218502): select count(pool.udid) from dt.idl_dim_ssj_user usr
inner join dt.idl_dim_ssj_user_device_pool pool 
on usr.fname=pool.fname
INFO  : Query ID = hive_20190111142020_c6710c56-cd4d-4919-8cea-b06ed2218502
INFO  : Total jobs = 2
INFO  : Starting task [Stage-6:CONDITIONAL] in serial mode
INFO  : Stage-1 is selected by condition resolver.
INFO  : Launching Job 1 out of 2
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks not specified. Estimated from input data size: 51
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:75
INFO  : Submitting tokens for job: job_1541645032015_2453904
INFO  : Hadoop job information for Stage-1: number of mappers: 75; number of reducers: 51
......
INFO  : MapReduce Total cumulative CPU time: 20 minutes 50 seconds 230 msec
INFO  : Ended Job = job_1541645032015_2453904
INFO  : Launching Job 2 out of 2
INFO  : Starting task [Stage-2:MAPRED] in serial mode
INFO  : Number of reduce tasks determined at compile time: 1
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:34
INFO  : Submitting tokens for job: job_1541645032015_2453919
INFO  : Hadoop job information for Stage-2: number of mappers: 34; number of reducers: 1
......
INFO  : MapReduce Total cumulative CPU time: 42 seconds 330 msec
INFO  : Ended Job = job_1541645032015_2453919
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1: Map: 75  Reduce: 51   Cumulative CPU: 1250.23 sec   HDFS Read: 4787835672 HDFS Write: 11118 SUCCESS
INFO  : Stage-Stage-2: Map: 34  Reduce: 1   Cumulative CPU: 42.33 sec   HDFS Read: 86991 HDFS Write: 9 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 21 minutes 32 seconds 560 msec
INFO  : Completed executing command(queryId=hive_20190111142020_c6710c56-cd4d-4919-8cea-b06ed2218502); Time taken: 112.435 seconds
INFO  : OK
+-----------+--+
|    _c0    |
+-----------+--+
| 37683117  |
+-----------+--+
1 row selected (113.862 seconds)

3. 疑问???

内关联,两个表先后顺序不同,最多是影响MP执行效率,但不应该执行的返回结果不同啊!求大佬帮忙看下,谢谢
另外附上两边执行计划对比差异图:
HIVE内关联顺序影响返回结果?求助。。。上图是对比差异 也看不出有啥异常。。。。求助,感谢