Rolling Regression Data Frame滚动回归数据框

问题描述:

请注意,以前可能已经问过这个问题,但我还没有找到明确的解决方案来处理数据框。Rolling Regression Data Frame滚动回归数据框

我想在5天的后面运行滚动线性回归。 (小,所以可以在这里说明)

到目前为止,我想:

rollingbeta <- rollapply(df, 
          width=5, 
          FUN = function(Z) 
          { 
          t = lm(formula=y_Close ~ x_Close+0, data = as.data.frame(Z)); 
          return(t$coef)[1] 
          }, 
          by.column=FALSE, align="right",fill = NA) 

    head(rollingbeta,100) 

不过,我希望有滚动回溯测试版。相反,我有10列输出。

> NCOL(rollingbeta) 
[1] 10 

任何人都可以协助吗?

这里是伪数据(保存为.txt和读取)

df <- read.table("your_dir\df.txt",header=TRUE, sep="", stringsAsFactors=FALSE) 

      Date open.x high.x low.x x_Close volume.x open.y high.y low.y y_Close volume.y x.y.cor 
1451 2010-01-04 57.32 58.13 57.32 57.85 442900 6.61 6.8400 6.61 6.83 833100  NA 
1452 2010-01-05 57.90 58.33 57.54 58.20 436900 6.82 7.1200 6.80 7.12 904500  NA 
1453 2010-01-06 58.20 58.56 58.01 58.42 850600 7.05 7.3800 7.05 7.27 759800  NA 
1454 2010-01-07 58.31 58.41 57.14 57.90 463600 7.24 7.3000 7.06 7.11 557800  NA 
1455 2010-01-08 57.45 58.62 57.45 58.47 206500 7.08 7.3500 6.95 7.29 588100  NA 
1456 2010-01-11 58.79 59.00 57.22 57.73 331900 7.38 7.4500 7.17 7.22 450500  NA 
1457 2010-01-12 57.20 57.21 56.15 56.34 428500 7.15 7.1900 6.87 7.00 694700  NA 
1458 2010-01-13 56.32 56.66 54.83 56.56 577500 7.05 7.1700 6.98 7.15 528800  NA 
1459 2010-01-14 56.51 57.05 55.37 55.53 368100 7.08 7.1701 7.08 7.11 279900  NA 
1460 2010-01-15 56.59 56.59 55.19 55.84 417900 7.03 7.0500 6.95 7.03 407600  NA 

输出应为第一轧制线性回归应该是:

NA NA NA NA NA 0.1229065 

考虑使用roll包。

library(magrittr); requireNamespace("roll") 
ds <- readr::read_csv(
    "  Date, open.x, high.x, low.x, x_Close, volume.x, open.y, high.y, low.y, y_Close, volume.y 
    2010-01-04, 57.32, 58.13, 57.32, 57.85, 442900, 6.61, 6.8400, 6.61, 6.83, 833100 
    2010-01-05, 57.90, 58.33, 57.54, 58.20, 436900, 6.82, 7.1200, 6.80, 7.12, 904500 
    2010-01-06, 58.20, 58.56, 58.01, 58.42, 850600, 7.05, 7.3800, 7.05, 7.27, 759800 
    2010-01-07, 58.31, 58.41, 57.14, 57.90, 463600, 7.24, 7.3000, 7.06, 7.11, 557800 
    2010-01-08, 57.45, 58.62, 57.45, 58.47, 206500, 7.08, 7.3500, 6.95, 7.29, 588100 
    2010-01-11, 58.79, 59.00, 57.22, 57.73, 331900, 7.38, 7.4500, 7.17, 7.22, 450500 
    2010-01-12, 57.20, 57.21, 56.15, 56.34, 428500, 7.15, 7.1900, 6.87, 7.00, 694700 
    2010-01-13, 56.32, 56.66, 54.83, 56.56, 577500, 7.05, 7.1700, 6.98, 7.15, 528800 
    2010-01-14, 56.51, 57.05, 55.37, 55.53, 368100, 7.08, 7.1701, 7.08, 7.11, 279900 
    2010-01-15, 56.59, 56.59, 55.19, 55.84, 417900, 7.03, 7.0500, 6.95, 7.03, 407600" 
) 

runs <- roll::roll_lm(
    x   = as.matrix(ds$x_Close), 
    y   = as.matrix(ds$y_Close), 
    width  = 5, 
    intercept = FALSE 
) 

# Nested in a named-column, within a matrix, within a list. 
ds$beta <- runs$coefficients[, "x1"] 

ds$beta 
# [1]  NA  NA  NA  NA 0.1224813 
# [6] 0.1238653 0.1242478 0.1246279 0.1256553 0.1259121 

仔细检查数据集中变量的对齐方式。 x_Close约为50,而y_Close约为7.这可能解释了上述预期值0.1229065和0.1224813之间的小差异。

+0

运行