在我自己的列中使用SQL中的算术将第三列填充到零。（复杂的，只有当满足某些标准时）

问题描述：

所以这里是我的问题。勉强自己，因为它需要一些思考，只是想把自己的头围绕在我想要做的事情上。我正在与Quarterly census employment and wage data合作。 QCEW数据有一些叫做抑制码的东西。如果数据面额（总体来说，位置商数以及每个季度每年在一年内）被抑制，那么该面额的所有数据都为零。我有我的表设置以下列方式（只显示你有关的问题列）：在我自己的列中使用SQL中的算术将第三列填充到零。（复杂的，只有当满足某些标准时）

A County_Id column,    
Industry_ID column, 
Year column, 
Qtr column,

打压列（0为不抑制，1为抑制），
Data_Category柱（1总体而言，2 LQ和3在过去的一年），
Data_Denomination柱（去1-8正在看什么具体的数据在该类别例如：月度就业，计税工资等典型的数据），和一个值列（如果Data_Category被抑制，它将为零 - 因为所有的数据面额值将为零）。

现在，如果1991年第1季度的总体数据（猫1）被抑制，但下一季度第1季的总体数据和一年中的总数（猫1和猫3）都不受抑制，那么我们可以推断出什么价值将是第一年的抑制数据，因为OTY1991q1 = (Overall1991q1 - Overall1990q1)。因此，为了找到被抑制的数据，我们只需从我们的猫3（denom 1-8）值中减去我们的猫1（denom 1-8）值，以替换来自前一年的抑制值中的零。数学上的掌握相当容易，难点在于有数百万列用来检查这些标准。我试着写某种SQL查询会为我做这个，检查以确保总体-N QTR-N被抑制，然后看看，如果明年是不是整体和oty，（在可能是某种复杂的病例陈述？那么如果满足这些条件，则执行两个Data_Cat-Data_Denom类别的算术运算，并替换相应Cat-Denom值中的零点。

下面是一个简单示例data_cats删除），我希望将有助于让我尝试跨做。

|CountyID IndustryID Year Qtr Suppressed Data_Cat Data_Denom Value                   
| 5   10  1990 1  1   1  1  0                                      
| 5   10  1990 1  1   1  2  0                                      
| 5   10  1990 1  1   1  3  0                                      
| 5   10  1991 1  0   1  1  5                                      
| 5   10  1991 1  0   1  2  15                                      
| 5   10  1991 1  0   1  3  25                                      
| 5   10  1991 1  0   3  1  20                                      
| 5   10  1991 1  0   3  2  20                                      
| 5   10  1991 1  0   3  3  35

所以基本上我们试图在这里做的是采取从各数据类别的整体数据（我删除LQ〜data_cat 2），因为它与data_den无关OM在1991年（我已经从8缩减到3为简单起见），从整体上1991年的值减去它，这会给你的应用
|前一年1990年cat_1的价值。所以这里data_cat 1 Data_denom 1将是15（20-5），denom 2将是5（20-15），而denom 3将是10（35-25）。（Oty 1991q1 - 总体1991q1）= 1990q1。我希望这有帮助。就像我说的那样，问题不是数学问题，它正在制定一个查询来检查这个标准数百万次。

这将是一个容易得多，如果你能提供一些虚拟数据和你想要的数据输出。 – iamdave

我有一个示例excel文件，我已经设置了更多的简化我想要做的事情，但这是我第一次问一个问题，有没有办法在这个页面上共享文件？ –

https://senseful.github.io/web-tools/text-table/ - 格式为您的问题中的代码 – SqlZim

答

如果你想找到一个具有2行unsupressed数据的下一个年份和季度supressed数据，我们可以使用交叉适用（）做这样的事情：

测试设置：http://rextester.com/ORNCFR23551

select t.* 
    , NewValue = cat3.value - cat1.value 
from t 
    cross apply (
     select i.value 
     from t as i 
     where i.CountyID = t.CountyID 
     and i.IndustryID = t.IndustryID 
     and i.Data_Denom = t.Data_Denom 
     and i.Year  = t.Year +1 
     and i.Qtr  = t.Qtr 
     and i.Suppressed = 0 
     and i.Data_Cat = 1 
) cat1 
    cross apply (
     select i.value 
     from t as i 
     where i.CountyID = t.CountyID 
     and i.IndustryID = t.IndustryID 
     and i.Data_Denom = t.Data_Denom 
     and i.Year  = t.Year +1 
     and i.Qtr  = t.Qtr 
     and i.Suppressed = 0 
     and i.Data_Cat = 3 
) cat3 
where t.Suppressed = 1 
    and t.Data_Cat = 1

回报：

+----------+------------+------+-----+------------+----------+------------+-------+----------+ 
| CountyID | IndustryID | Year | Qtr | Suppressed | Data_Cat | Data_Denom | Value | NewValue | 
+----------+------------+------+-----+------------+----------+------------+-------+----------+ 
|  5 |   10 | 1990 | 1 |   1 |  1 |   1 |  0 |  15 | 
|  5 |   10 | 1990 | 1 |   1 |  1 |   2 |  0 |  5 | 
|  5 |   10 | 1990 | 1 |   1 |  1 |   3 |  0 |  10 | 
+----------+------------+------+-----+------------+----------+------------+-------+----------+

使用 cross apply()使用有效的导出值返回行

使用 outer apply()返回所有行

select t.* 
    , NewValue = coalesce(nullif(t.value,0),cat3.value - cat1.value,0) 
from t 
    outer apply (
     select i.value 
     from t as i 
     where i.CountyID = t.CountyID 
     and i.IndustryID = t.IndustryID 
     and i.Data_Denom = t.Data_Denom 
     and i.Year  = t.Year +1 
     and i.Qtr  = t.Qtr 
     and i.Suppressed = 0 
     and i.Data_Cat = 1 
) cat1 
    outer apply (
     select i.value 
     from t as i 
     where i.CountyID = t.CountyID 
     and i.IndustryID = t.IndustryID 
     and i.Data_Denom = t.Data_Denom 
     and i.Year  = t.Year +1 
     and i.Qtr  = t.Qtr 
     and i.Suppressed = 0 
     and i.Data_Cat = 3 
) cat3

回报：

+----------+------------+------+-----+------------+----------+------------+-------+----------+ 
| CountyID | IndustryID | Year | Qtr | Suppressed | Data_Cat | Data_Denom | Value | NewValue | 
+----------+------------+------+-----+------------+----------+------------+-------+----------+ 
|  5 |   10 | 1990 | 1 |   1 |  1 |   1 |  0 |  15 | 
|  5 |   10 | 1990 | 1 |   1 |  1 |   2 |  0 |  5 | 
|  5 |   10 | 1990 | 1 |   1 |  1 |   3 |  0 |  10 | 
|  5 |   10 | 1991 | 1 |   0 |  1 |   1 |  5 |  5 | 
|  5 |   10 | 1991 | 1 |   0 |  1 |   2 | 15 |  15 | 
|  5 |   10 | 1991 | 1 |   0 |  1 |   3 | 25 |  25 | 
|  5 |   10 | 1991 | 1 |   0 |  3 |   1 | 20 |  20 | 
|  5 |   10 | 1991 | 1 |   0 |  3 |   2 | 20 |  20 | 
|  5 |   10 | 1991 | 1 |   0 |  3 |   3 | 35 |  35 | 
+----------+------------+------+-----+------------+----------+------------+-------+----------+

嗯，当然，但没有任何来自你的链接的例子实际上适用于*这个查询*。（我们没有加入不平等或'TOP n';或者做任何无法用'INNER JOIN'处理的东西。）如果物理设计良好，'INNER JOIN'将会正常工作;如果不是，则不会执行任何查询。 –

感谢SqlZim，你的代码也可以工作。有没有什么方法可以改变它，使其可以在多年内发生？假设你在1992年到1993年间发生了同样的现象，但是你想用相同的选择陈述更新四年中推断出的两个值。有没有办法做到这一点，还是这个要求太高？ –

@ J.Jack我不知道我理解你的问题。如果连续两年被压制，那么如何在第二个压制年份获得被压制年份的Cat3值？ – SqlZim

答

更新1 - 修正了一些列的名称

更新2 - 在第二次查询改进别名

好吧，我想我明白了。

如果你是只是想要做出这一推断，那么以下可能会有所帮助。（如果这只是你想填补数据空白的许多推论中的第一个，你可能会发现，不同的方法会导致更有效的解决方案来做这两个/所有这些，但是当我到达那里时，我想要穿过那座桥...）

虽然许多基本逻辑保持不变，你会如何调整它取决于你是否要查询只是提供你会推断（例如，以驱动UPDATE语句）的值，或者是否要在更大的查询中使用此逻辑内联。出于性能方面的原因，我怀疑前者更有意义（特别是如果您可以更新一次，然后多次读取结果数据集），所以我会从这些方面开始构思并立即回到另一个。 ..

这听起来像你有一个单一的表（我将它称为QCEW）与所有这些列。在这种情况下，使用连接到每个抑制的总体数据点（c_oa在下面的代码）与总体相应，并从一年后oty数据点相关联：

SELECT c_oa.*, n_oa.value - n_oty.value inferred_value 
    FROM   QCEW c_oa --current yr/qtr overall 
     inner join QCEW n_oa --next yr (same qtr) overall 
       on c_oa.countyId = n_oa.countyId 
      and c_oa.industryId = n_oa.industryId 
      and c_oa.year = n_oa.year - 1 
      and c_oa.qtr = n_oa.qtr 
      and c_oa.data_denom = n_oa.data_denom 
     inner join QCEW n_oty --next yr (same qtr) over-the-year 
       on c_oa.countyId = n_oty.countyId 
      and c_oa.industryId = n_oty.industryId 
      and c_oa.year = n_oty.year - 1 
      and c_oa.qtr = n_oty.qtr 
      and c_oa.data_denom = n_oty.data_denom 
WHERE c_oa.SUPPRESSED = 1 
    AND c_oa.DATA_CAT = 1 
    AND n_oa.SUPPRESSED = 0 
    AND n_oa.DATA_CAT = 1 
    AND n_oty.SUPPRESSED = 0 
    AND n_oty.DATA_CAT = 3

现在这听起来像桌子大，而我们”我刚刚加入了3个实例;所以为了这个工作你需要良好的物理设计（合适的索引/统计联接列等）。这就是为什么我建议基于上述查询进行一次更新;当然，它可能会运行很长时间，但您可以立即阅读推断的值。

但是，如果您真的想将此直接合并到数据查询中，您可以修改它以显示所有值，并将推断值混合在一起。我们需要切换到外部联接要做到这一点，而且我会做一些稍微怪异的事情加盟条件，使之适合在一起：

SELECT src.COUNTYID 
    , src.INDUSTRYID 
    , src.YEAR 
    , src.QTR 
    , case when (n_oa.value - n_oty.value) is null 
      then src.suppressed 
      else 2 
     end as SUPPRESSED_CODE -- 0=NOT SUPPRESSED, 1=SUPPRESSED, 2=INFERRED 
    , src.DATA_CAT 
    , src.DATA_DENOM 
    , coalesce(n_oa.value - n_oty.value, src.value) as VALUE 
    FROM   QCEW src  --a source row from which we'll generate a record 
     left join QCEW n_oa --next yr (same qtr) overall (if src is suppressed/overall) 
      on src.countyId = n_oa.countyId 
      and src.industryId = n_oa.industryId 
      and src.year = n_oa.year - 1 
      and src.qtr = n_oa.qtr 
      and src.data_denom = n_oa.data_denom 
      and src.SUPPRESSED = 1 and n_oa.SUPPRESSED = 0 
      and src.DATA_CAT = 1 and n_oa.DATA_CAT = 1 
     left join QCEW n_oty --next yr (same qtr) over-the-year (if src is suppressed/overall) 
      on src.countyId = n_oty.countyId 
      and src.industryId = n_oty.industryId 
      and src.year = n_oty.year - 1 
      and src.qtr = n_oty.qtr 
      and src.data_denom = n_oty.data_denom 
      and src.SUPPRESSED = 1 and n_oty.SUPPRESSED = 0 
      and src.DATA_CAT = 1 and n_oty.DATA_CAT = 3

第二组代码是否可以与上面格式化的表一起使用？我对你使用c_oa也有点困惑。作为压制的整体数据点。你可以选择那些还不存在的东西？我很抱歉，但就像我说过的，我实际上只是在一月份开始学习SQL，所以在我快速提取的时候，仍然有很多我不知道，因为我没有必要真正做到这一点。 –

因此，一次提出问题1 ... –

**它会工作W /表格格式如上？**除非我失去了一些东西，上述符合我的假设（列名除外，我将更新）。我会试着看看我是否错过了一些东西;而且还可以测试一个数据样本的查询，并查看它是否按照您的要求。 –

在我自己的列中使用SQL中的算术将第三列填充到零。 （复杂的，只有当满足某些标准时）

相关推荐

在我自己的列中使用SQL中的算术将第三列填充到零。（复杂的，只有当满足某些标准时）