【CMU & AWS 2020】Forecasting Big Time Series: Theory and Practice(Part I)
报告题目:orecasting Big Time Series: Theory and Practice
报告时间:2020.04
报告介绍:https://lovvge.github.io/Forecasting-Tutorial-WWW-2020/
报告录影:https://www.amazon.science/videos-and-tutorials/forecasting-big-time-series-theory-and-practice
报告专家:
Christos Faloutsos (CMU and Amazon)
Valentin Flunkert (AWS AI Labs)
Jan Gasthaus (AWS AI Labs)
Tim Januschowski (AWS AI Labs)
Yuyang (Bernie) Wang (AWS AI Labs)
文章目录
报告简介
Time series forecasting is a key ingredient in the automation and optimization of business processes: in retail, deciding which products to order and where to store them depends on the forecasts of future demand in different regions; in cloud computing, the estimated future usage of services and infrastructure components guides capacity planning; and workforce scheduling in warehouses and factories requires forecasts of the future workload. Recent years have witnessed a paradigm shift in forecasting techniques and applications, from computer-assisted model- and assumption-based to data-driven and fully-automated. This shift can be attributed to the availability of large, rich, and diverse time series data sources and result in a set of challenges that need to be addressed such as the following. How can we build statistical models to efficiently and effectively learn to forecast from large and diverse data sources? How can we leverage the statistical power of “similar” time series to improve forecasts in the case of limited observations? What are the implications for building forecasting systems that can handle large data volumes? | 时间序列预测是业务流程自动化和优化的关键要素:在零售中,决定要订购哪些产品以及将它们存储在何处取决于对不同地区未来需求的预测。在云计算中,服务和基础架构组件的未来估计使用量将指导容量规划;仓库和工厂中的人员调度需要对未来的工作量进行预测。近年来,预测技术和应用已发生了范式转变,从基于计算机辅助的模型和假设的预测,到以数据为驱动力的全自动化模型。这种转变可以归因于大量,丰富和多样的时间序列数据源的可用性,并导致一系列需要解决的挑战,例如以下。我们如何建立统计模型,以有效地学习来自大量不同数据源的预测?在观测值有限的情况下,我们如何利用“相似”时间序列的统计能力来改善预测?构建可处理大数据量的预测系统有什么含义? |
The objective of this tutorial is to provide a concise and intuitive overview of the most important methods and tools available for solving large-scale forecasting problems. We review the state of the art in both: (1) classical modeling of time series, (2) deep learning for forecasting. We also discuss the practical aspects of building a large scale forecasting system, including data integration, feature generation, backtesting framework, error tracking and analysis, etc. Accompanied with the practice side is a hands-on session, where we would engage the audience with Jupyter notebooks that demonstrates the key concepts in the theory part. Furthermore, we provides interactive demos, showing various avenues to solve business problems with AWS Forecasting offerings such as GluonTS, DeepAR (SageMaker), and Amazon Forecast. | 本教程的目的是简要直观地概述可用于解决大规模预测问题的最重要方法和工具。我们在以下两个方面都审查了最新技术:(1)时间序列的经典建模,(2)预测的深度学习。我们还讨论了构建大型预测系统的实际方面,包括数据集成,功能生成,回测框架,错误跟踪和分析等。与实践相关的是一次动手实践会议,我们将与观众互动Jupyter笔记本在理论部分演示了关键概念。此外,我们提供了交互式演示,展示了使用AWS Forecasting产品(例如GluonTS,DeepAR(SageMaker)和Amazon Forecast)解决业务问题的各种途径。 |
Keywords: Forecasting, Neural Network, Time Series | 关键字: 预测, 神经网络,时间序列 |
内容梗概
Part 1 – Fundamentals
- P1.1. Similarity search: Euclidean/time-warping; feature extraction and SAMs
- P1.2. Periodicities: DFT/DWT
- P1.3. Linear Forecasting: AR (Box-Jenkins)
- P1.4 Non-linear forecasting: lag-plots– Gray-box modeling: Lotka-Volterra
- P1.5. Tensors: PARAFAC
Motivation - Applications
- Financial, sales, economic series
- Medical
– reactions to new drugs
– elderly care - civil/automobile infrastructure
– bridge vibrations [Oppenheim+02]
– road conditions / traffic monitoring - Weather, environment/anti-pollution
– volcano monitoring
– air/water pollutant monitoring
– sunspots - Computer systems
– web servers (caching, prefetching)
– network traffic monitoring
– …
Problem #1
Goal: given a signal (eg., #packets over time)
Find: patterns, periodicities, and/or compress
Problem #2: Forecast
Problem #2’: Similarity search
Problem #3
Important observations
P1.1. Similarity Search and Indexing
- distance functions: Euclidean;Time-warping
- indexing
- feature extraction
Idea: ‘GEMINI’
How to extract ‘good features’?
- Q: how to extract features (commonalities)? (given the data)
- A: SVD, ICA
P1.2 DSP(Digital Signal Processing)
DFT
DFT highlights the periodicities.
DFT - Conclusions
DWT
Goal: given a signal (eg., #packets over time)
Find: patterns, periodicities, and/or compress