填充的NULL-S的空白表中的平均数值

问题描述:

我有一个领域(id,letter,date)和一些数据的表吧:填充的NULL-S的空白表中的平均数值

1 A 2012-01-01 
2 B NULL 
3 C NULL 
4 D 2012-01-15 

我想最近的非平均日期,以填补空值-NULL值。这样的:

1 A 2012-01-01 
2 B 2012-01-08 
3 C 2012-01-08 
4 D 2012-01-15 

或者,也许,即使这样的:

1 A 2012-01-01 
2 B 2012-01-08 
3 C 2012-01-11 
4 D 2012-01-15 

两种变体是巨大的。有没有简单的方法在MySQL中实现它?

在此先感谢

UPD表是相当大的,约700.000记录,而像描述的概念约50.000差距。

UPD2有点清洁器:表可以是这样的:

1 A 2012-01-01 
2 B NULL 
3 C NULL 
4 D 2012-01-15 
5 E NULL 
6 F 2012-01-17 
7 G NULL 
8 H NULL 
9 I 2012-01-20 

预期的结果是这样的:

1 A 2012-01-01 
2 B **2012-01-08** 
3 C **2012-01-08** 
4 D 2012-01-15 
5 E **2012-01-16** 
6 F 2012-01-17 
7 G **2012-01-18** 
8 H **2012-01-18** 
9 I 2012-01-20 

(星号是要注意变更值)。谢谢

UPD3感谢所有人。但我会用另一种方式来做,用一个简单的公式计算日期:needed_date = [(max(date)-min(date))/(max(id)-min(id)] *(my_ID-min(id ))+最小值(日期)

+0

看看第一组数据,如果你有其他的记录,像'5,E,NULL','6,F,2012-01-20',会是怎样的结果呢? – 2013-03-22 18:05:12

+1

*你为什么要操纵数据?检索记录时应该进行此计算。 – Kermit 2013-03-22 18:05:32

+0

什么是记录的顺序和字段的值之间的相关性(即会一直为B在时间之前)? – 2013-03-22 18:07:50

假设你有一个表称为T这样的:

每个NULL记录
CREATE TABLE T(
    id INT, 
    time DATETIME 
); 

下面的查询会给你的界限:

SELECT T.Id 
    , MAX(T1.Time) as MinDate 
    , MIN(T2.Time) as MaxDate  
    FROM T 
INNER JOIN T T1 ON T1.Id < T.Id 
       AND T.time IS NULL 
       AND NOT T1.time IS NULL 
INNER JOIN T T2 ON T2.id > T.id 
       AND T.time IS NULL 
       AND NOT T2.time IS NULL 
GROUP BY Id 

输出将成为:

Id MinDate  MaxDate 
2 2012-01-01 2012-01-15 
3 2012-01-01 2012-01-15 

因此,下一步将做使用此结果集,平均为实例,以更新的NULL值的更新..

UPDATE T 
INNER JOIN 
(
    SELECT T.Id, MAX(T1.Time) as MinTime, MIN(T2.Time) as MaxTime 
    FROM T 
    INNER JOIN T T1 ON T1.id < T.id 
       AND T.time IS NULL 
       AND NOT T1.time IS NULL 
    INNER JOIN T T2 ON T2.id > T.id 
       AND T.time IS NULL 
       AND NOT T2.time IS NULL  
    GROUP BY T.ID) T3 
ON T3.id = T.id 
SET T.time = FROM_UNIXTIME((UNIX_TIMESTAMP(T3.MinTime) + UNIX_TIMESTAMP(T3.MaxTime))/2) 
WHERE T.time IS NULL 

Working SQLFiddle Here

+0

大于它是一个解决方案。但它'EXPLAIN'约700.000记录的表也不是那么好:( – 2013-03-22 18:26:46

QUERY#1

SELECT id,letter,IFNULL(date,dt) date FROM mytable, 
(SELECT DATE(mindate + INTERVAL (secdiff/2) SECOND) dt 
FROM (SELECT mindate,UNIX_TIMESTAMP(maxdate) 
- UNIX_TIMESTAMP(mindate) secdiff 
FROM (SELECT MIN(date) mindate FROM mytable) N, 
(SELECT MAX(date) maxdate FROM mytable) X) AA) A; 

样本数据

mysql> DROP TABLE IF EXISTS mytable; 
Query OK, 0 rows affected (0.00 sec) 

mysql> CREATE TABLE mytable 
    -> (
    -> id int not null auto_increment, 
    -> letter char(1), 
    -> `date` date, 
    -> primary key (id) 
    ->); 
Query OK, 0 rows affected (0.07 sec) 

mysql> INSERT INTO mytable (letter,date) VALUES 
    -> ('A','2012-01-01'),('B',NULL),('C',NULL),('D','2012-01-15'); 
Query OK, 4 rows affected (0.00 sec) 
Records: 4 Duplicates: 0 Warnings: 0 

mysql> SELECT * FROM mytable; 
+----+--------+------------+ 
| id | letter | date  | 
+----+--------+------------+ 
| 1 | A  | 2012-01-01 | 
| 2 | B  | NULL  | 
| 3 | C  | NULL  | 
| 4 | D  | 2012-01-15 | 
+----+--------+------------+ 
4 rows in set (0.00 sec) 

mysql> 

QUERY#1中执行

mysql> SELECT id,letter,IFNULL(date,dt) date FROM mytable, 
    -> (SELECT DATE(mindate + INTERVAL (secdiff/2) SECOND) dt 
    -> FROM (SELECT mindate,UNIX_TIMESTAMP(maxdate) 
    -> - UNIX_TIMESTAMP(mindate) secdiff 
    -> FROM (SELECT MIN(date) mindate FROM mytable) N, 
    -> (SELECT MAX(date) maxdate FROM mytable) X) AA) A; 
+----+--------+------------+ 
| id | letter | date  | 
+----+--------+------------+ 
| 1 | A  | 2012-01-01 | 
| 2 | B  | 2012-01-08 | 
| 3 | C  | 2012-01-08 | 
| 4 | D  | 2012-01-15 | 
+----+--------+------------+ 
4 rows in set (0.00 sec) 

mysql> 

QUERY#2(清洁版)

此查询使用UNIX时间戳记的平均值。如果所有的日期是NULL,它使用今天的日期:

SELECT id,letter,IFNULL(date,dt) date FROM mytable, 
(
    SELECT IF(K=0,DATE(NOW()),avgdt) dt FROM 
    (SELECT DATE(FROM_UNIXTIME(AVG(UNIX_TIMESTAMP(date)))) 
    avgdt FROM mytable) AA, 
    (SELECT COUNT(date) K FROM mytable) BB 
) A; 

QUERY#2中执行

mysql> SELECT id,letter,IFNULL(date,dt) date FROM mytable, 
    -> (
    ->  SELECT IF(K=0,DATE(NOW()),avgdt) dt FROM 
    ->  (SELECT DATE(FROM_UNIXTIME(AVG(UNIX_TIMESTAMP(date)))) 
    ->  avgdt FROM mytable) AA, 
    ->  (SELECT COUNT(date) K FROM mytable) BB 
    ->) A; 
+----+--------+------------+ 
| id | letter | date  | 
+----+--------+------------+ 
| 1 | A  | 2012-01-01 | 
| 2 | B  | 2012-01-08 | 
| 3 | C  | 2012-01-08 | 
| 4 | D  | 2012-01-15 | 
+----+--------+------------+ 
4 rows in set (0.05 sec) 

mysql> 

试试看!

+0

感谢。但是,这将改变我所有的NULL-S非空值的第一对之间的平均日期。请参见更新的问题:( – 2013-03-22 19:08:51