Time-Series: The Power of Forecasting
Data Science Digest, Vol. 4
This article is part of our “Data Science Digest” series. With this series, we will help you keep up with the developments in Data Science, show you the potential of data science techniques and give you a sneak peek into some of the exciting things we’ve been working on at Anchormen. In this article, we will talk about Time-series and different forecasting models and applications.
In its simplest form, a time series is a set of data points plotted on a graph or list in time order. The data points are sequential and taken over equally spaced points in time and have some meaning. The most frequent example that you have probably seen is sales over a period of time (days, weeks, months, etc.). Time series analysis is a collection of methods for analyzing the data in order to extract meaningful results and patterns from the data.
Time-series on its own is not very remarkable, but when combined with models for forecasting some very interesting possibilities arise. In the following paragraphs, we will briefly discuss some of the models we’ve been using, some Anchormen use cases, and one of the difficulties when trying to predict results with time-series – spurious correlation. So, let’s dig into it.
When talking about understanding and forecasting data we have to mention ARIMA. It is one of the classical models which is still widely used. ARIMA decomposes a time-series into 3 components. AR (autoregressive component), which indicates that the value of a variable linearly depends on its values in the past. I (integrated component), which is a trick that can be applied to make a time-series stationary. And MA (moving average component), which indicates that the value of a time-series depends linearly on the error term and lagged versions of the error term.
More recently, we’ve been using Prophet for time-series forecasting as well. Prophet is the open-source forecasting tool introduced by Facebook last year. The tool aims to automate forecasting techniques and to be intuitive and easier to use by wider audience.
Another model for forecasting is Recurrent Neural Network. The model approximates a mapping function from input variables to output variables. This is valuable it is robust to noise and is nonlinear (neural networks do not make strong assumptions about the mapping function and learn linear and nonlinear relationships).
Any of these models can provide you with a forecast (depending on the use case, some might be better than others), but what enables Anchormen to consistently provide high degree of accuracy is the fact that we use cross-validation algorithm to select the best model.
For our clients, we’ve developed a number of Time-series Analysis models with different applications. Most frequently, we help large organizations with complex sales patterns to forecast short-term and long-term sales performance. Through a combination of classical time-series models and new technology (such as neural networks) we are able to estimate an organization’s future sales performance in a period of up to 6 months.
Another use case we’ve worked on is Stock optimization. This solution is aimed at companies with multiple warehouses and large variety of products. Through analyzing product usage, purchase behavior, and seasonal peaks, we can make a forecasting model for organizations that want to predict when a certain product/article will run out of stock and what is the optimal delivery method which won’t lead to extra logistics costs.
We’ve also used trigger models for anomaly detection. Meaning that if your company’s product/services is highly dependent on specific peaks or events happening, with time-series forecasting we are able to determine when such an event will occur in order for the organization to make the necessary preparations or boost marketing and sales efforts to take advantage of it.
One interesting issue when trying to forecast with Time-series is a thing called Spurious Correlation. This occurs when two variables which are not related to each other may seem like they are, due to a coincidence or a third, unseen factor. Here are two examples of this phenomenon; one where there is a third factor and one where it’s a simple coincidence.
There was a study that was done that found a strong correlation between the ice cream sales and number of shark attacks for a number of beaches that were sampled. The obvious conclusion would be that increasing ice cream sales causes more shark attacks. But in this case, the better explanation is that there is a third variable – temperature. Warmer temperatures cause ice cream sales to go up. Warmer temperatures also bring more people to the beaches, increasing the chances of shark attacks. Correlation is not causation.
An interesting correlation shows that years in which Nicholas Cage starred in more movies are correlated with years of higher number of deaths by getting tangled in the bed sheets. So, does Nicholas Cage cause people to get caught in their bed sheets? This is probably just a coincidence. These two variables might just show a correlation purely by chance. But you never know…
|Data Science Digest Vol.3||Data Science Digest Vol.5|