An alternative approach to get standardization parameters would be to calculate them (again):
tmp <- monthly_retail_tbl %>%
group_by(Industry) %>%
arrange(Month) %>%
mutate(Turnover = log1p(x = Turnover)) %>%
group_map(~ c(mean = mean(.x$Turnover, na.rm = TRUE),
sd = sd(.x$Turnover, na.rm = TRUE))) %>%
bind_rows()
std_mean <- tmp$mean
std_sd <- tmp$sd
rm('tmp')
Hi, one could capture.output from standardize_vec like this:
stdout <- capture.output(groups <- lapply(X = 1:length(Industries), FUN = function(x){
monthly_retail_tbl %>%
filter(Industry == Industries[x]) %>%
arrange(Month) %>%
mutate(Turnover = log1p(x = Turnover)) %>%
mutate(Turnover = standardize_vec(Turnover)) %>%
future_frame(Month, .length_out = "12 months", .bind_data = TRUE) %>%
mutate(Industry = Industries[x]) %>%
tk_augment_fourier(.date_var = Month, .periods = 12, .K = 1) %>%
tk_augment_lags(.value = Turnover, .lags = 12) %>%
tk_augment_slidify(.value = Turnover_lag12,
.f = ~ mean(.x, na.rm = TRUE),
.period = c(3, 6, 9, 12),
.partial = TRUE,
.align = "center")
}), type = "message")
And save mean and sd values automatically:
collect_vals <- function(stdout, string = "mean: "){
stdout %>% grep(string, ., value = TRUE) %>% gsub(string,"", .) %>% as.numeric()
}
std_mean <- stdout %>% collect_vals("mean: ")
std_sd <- stdout %>% collect_vals("standard deviation: ")
Bebo E
Hi , great work! I wanted to quickly ask. When augmenting lags and rolling features and then you split test and train, isn't there significant leakage in the test set due to the rolling features seeing the actual values for turnover?