This three part write up [Part IIPart III]is my attempt at a downtoearth explanation (and Python code) of theHoltWinters method for those of us who while hypothetically might bequite good at math, still try to avoid it at every opportunity. I hadto dive into this subject while tinkering ontgres (which features a Golang implementation). Andhaving found it somewhat complex (and yet so brilliantlysimple), figured that it’d be good to share this knowledge, andin the process, to hopefully solidify it in my head as well.
Triple Exponential Smoothing,also known as the HoltWinters method, is one of the many methods oralgorithms that can be used to forecast data points in a series,provided that the series is “seasonal”, i.e. repetitive over someperiod.
Еxponential smoothing in some form or another dates back to the workof Siméon Poisson (17811840),while its application in forecasting appears to have been pioneered over a century later in 1956 byRobert Brown (1923–2013)in his publicationExponential Smoothing for Predicting Demand,(Cambridge, Massachusetts). [Based on the URL it seems Brown was working on forecasting tobacco demand?]
In 1957 an MIT and University of Chicagograduate, professor Charles C Holt(19212010) was working at CMU (then known as CIT) on forecasting trends in production,inventories and labor force.It appears that Holt and Brown worked independently and knew not of eachother’s work.Holt published a paper “Forecasting trendsand seasonals by exponentially weighted moving averages” (Office of Naval Research ResearchMemorandum No. 52, Carnegie Institute of Technology) describingdouble exponential smoothing. Three years later, in 1960, a student ofHolts (?) Peter R. Winters improved the algorithm by adding seasonality andpublishedForecasting sales by exponentially weighted moving averages(Management Science 6, 324–342), citing Dr. Holt’s 1957 paper as earlier work on the same subject.This algorithm became known as triple exponential smoothing or the HoltWinters method,the latter probably because it was described in a 1960 PrenticeHall book “Planning Production, Inventories, and Work Force”by Holt, Modigliani, Muth,Simon,Bonini and Winters  good luck finding a copy!
Curiously, I’ve not been able to find any personal information on Peter R. Winters online. If you find anything, please let meknow, I’ll add a reference here.
In 2000 the HoltWinters method became well known in the ISPcircles at the height of the .com boom when Jake D. Brutlag (then of WebTV) publishedAberrant Behavior Detection in Time Series for Network Monitoring(Proceedings of the 14th Systems Administration Conference, LISA2000). It described how an open source Cimplementation [link to the actual commit]of a variant of the HoltWinters seasonal method, which he contributed as a featureto the very popular at ISPs RRDTool, could be used tomonitor network traffic.
In 2003, a remarkable 40+ years since the publication of Winterspaper, professor James W Taylorof Oxford University extended theHoltWinters method to multiple seasonalities (i.e. $n$th exponentialsmoothing) and published Shortterm electricity demand forecasting using double seasonal exponential smoothing(Journal of OperationalResearch Society, vol. 54, pp. 799–805). (But we won’t cover Taylorsmethod here).
In 2011 the RRDTool implementation contributed by Brutlag wasportedto Graphite by Matthew Graham thus making it even more popular in thedevops community.
So… how does it work?
The best way to explain triple exponential smoothing is to graduallybuild up to it starting with the simplest forecasting methods. Lestthis text gets too long, we will stop at triple exponential smoothing,though there are quite a few other methods known.
I used mathematical notation only where I thought it made best sense, sometimesaccompanied by an “English translation”, and where appropriatesupplemented with a bit of Python code.In Python I refrain from using any nonstandard packages, keeping theexamples plain. I chose not to use generatorsfor clarity. The objective here is to explainthe inner working of the algorithm so that you can implement ityourself in whatever language you prefer.
I also hope to demonstrate that this is simple enough that you do notneed to resort to SciPy or whatever(not that there is anything wrong with that).
But First, Some Terminology
Series
The main subject here is a series. In the real world we are mostlikely to be applying this to a time series, but for this discussionthe time aspect is irrelevant. A series is merely an ordered sequenceof numbers. We might be using words that are chronological in nature(past, future, yet, already, time even!), but only because it makes it easer tounderstand. So forget about time, timestamps, intervals,time does not exist,the only property each data point has (other than the value) is its order: first,next, previous, last, etc.
It is useful to think of a series as a list of twodimensional $x,y$coordinates, where $x$ is order (always going up by 1), and $y$ isvalue. For this reason in our math formulas we will be sticking to $y$for value and $x$ for order.
Observed vs Expected
Forecasting is estimating values that we do not yet know based on thethe values we do know. The values we know are referred to asobserved while the values we forecast as expected. The mathconvention to denote expected values is with thecircumflex a.k.a. “hat”: $\hat{y}$
For example, if we have a series that looks like [1,2,3]
, we mightforecast the next value to be 4. Using this terminology, givenobserved series [1,2,3]
the next expected value ${\hat{y}_4}$ is 4.
Method
We may have intuited based on [1,2,3]
that in this series each valueis 1 greater than the previous, which in math notation canbe expressed as and $\hat{y}_{x + 1} = y_x + 1$. This equation, theresult of our intuition, is known as a forecast method.
If our method is correct then the next observed value would indeed be4, but if [1,2,3]
is actually part of aFibonacci sequence, then where weexpected ${\hat{y}_4 = 4}$, we would observe $y_4 = 5$. Note the hatted${\hat{y}}$ (expected) in the former and $y$ (observed) in the latter expression.
Error, SSE and MSE
It is perfectly normal to compute expected values where we alreadyhave observed values. Comparing the two lets you compute the error,which is the difference between observed and expected and is anindispensable indication of the accuracy of the method.
Since difference can be negative or positive, the common convention isto use the absolute value or square the error so that the number is alwayspositive. For a whole series the squared errors are typically summedresulting in Sum of Squared Errors (SSE).Sometimes you may come across _Mean Squared Error(MSE).
And Now the Methods (where the fun begins!)
In the next few examples we are going to be using this tiny series:
1 

(Feel free to paste it and any of the following code snippets into your Pythonrepl)
Naive Method
This is the most primitive forecasting method. The premise of thenaive method is that the expected point is equal to the lastobserved point:
Using this method we would forecast the next point to be 12.
Simple Average
A less primitive method is the arithmetic averageof all the previously observed data points. We take all the values weknow, calculate the average and bet that that’s going to be the next value. Of course it won’t be it exactly,but it probably will be somewhere in the ballpark, hopefully you can see the reasoning behind thissimplistic approach.
(Okay, this formula is only here because I think the capital Sigmalooks cool. I am sincerely hoping that the average requires no explanation.) In Python:
123456 

As a forecasting method, there are actually situations where it’s spoton. For example your final school grade may be the average of all theprevious grades.
Moving Average
An improvement over simple average is the average of $n$ lastpoints. Obviously the thinking here is that only the recent valuesmatter. Calculation of the moving average involves what is sometimescalled a “sliding window” of size $n$:
12345678 

A moving average can actually be quite effective, especially if youpick the right $n$ for the series. Stock analysts adore it.
Also note that simple average is a variation of a moving average, thusthe two functions above could be rewritten as a single recursive one(just for fun):
123456789 

Weighted Moving Average
A weighted moving average is a moving average where within thesliding window values are given different weights, typically so thatmore recent points matter more.
Instead of selecting a window size, it requires a list of weights(which should add up to 1). For example if we picked [0.1,0.2, 0.3, 0.4]
as weights, we would be giving 10%, 20%, 30% and 40%to the last 4 points respectively. In Python:
1234567891011 

Weighted moving average is fundamental to what follows, please take amoment to understand it, give it a think before reading on.
I would also like to stress the importance of the weights adding upto 1. To demonstrate why, let’s say we pick weights [0.9, 0.8, 0.7,0.6]
(which add up to 3.0). Watch what happens:
12 

Picture time!
Here is a picture that demonstrates our tiny series and all of the aboveforecasts (except for naive).
It’s important to understand that which of the above methods is bettervery much depends on the nature of the series. The order in which Ipresented them was from simple to complex, but “more complex” doesn’tnecessarily mean “better”.
Single Exponential Smoothing
Here is where things get interesting. Imagine a weighted average wherewe consider all of the data points, while assigning exponentiallysmaller weights as we go back in time. For example if we started with0.9, our weights would be (going back in time):
…eventually approaching the big old zero. In some way this is verysimilar to the weighted average above, only the weights are dictatedby math, decaying uniformly. The smaller the starting weight, thefaster it approaches zero.
Only… there is a problem: weights do not add up to 1. The sum ofthe first 3 numbers alone is already 2.439! (Exercise for the reader: what numberdoes the sum of the weights approach and why?)
What earned Poisson, Holts or Roberts a permanent place in the historyof Mathematics is solving this with a succinct and elegant formula:
If you stare at it just long enough, you will see that the expectedvalue $\hat{y}_x$ is the sum of two products: $\alpha \cdot y_x$ and$(1\alpha) \cdot \hat{y}_{x1}$. You can think of $\alpha$ (alpha)as a sort of a starting weight 0.9 in the above (problematic)example. It is called the smoothing factor or smoothingcoefficient (depending on who wrote your text book).
So essentially we’ve got a weighted moving average with two weights:$\alpha$ and $1\alpha$. The sum of $\alpha$ and $1\alpha$ is 1, soall is well.
Now let’s zoom in on the right side of the sum. Cleverly, $1\alpha$is multiplied by the previous expected value$\hat{y}_{x1}$. Which, if you think about it, is the result of thesame formula, which makes the expression recursive (and programmerslove recursion), and if you were to write it all out on paper you wouldquickly see that $(1\alpha)$ is multiplied by itself again and againall the way to beginning of the series, if there is one, infinitelyotherwise. And this is why this method is calledexponential.
Another important thing about $\alpha$ is that its value dictates howmuch weight we give the most recent observed value versus the lastexpected. It’s a kind of a lever that gives more weight to the leftside when it’s higher (closer to 1) or the right side when it’s lower(closer to 0).
Perhaps $\alpha$ would be better referred to as memory decay rate: thehigher the $\alpha$, the faster the method “forgets”.
Why is it called “smoothing”?
To the best of my understanding this simply refers to the effect thesemethods have on a graph if you were to plot the values: jagged linesbecome smoother. Moving average also has the same effect, so itdeserves the right to be called smoothing just as well.
Implementation
There is an aspect of this method that programmers would appreciatethat is of no concern to mathematicians: it’s simple and efficient toimplement. Here is some Python. Unlike the previous examples, thisfunction returns expected values for the whole series, not just onepoint.
1234567891011 

The figure below shows exponentially smoothed version of our serieswith $\alpha$ of 0.9 (red) and $\alpha$ of 0.1 (orange).
Looking at the above picture it is apparent that the $\alpha$ value of 0.9follows the observed values much closer than 0.1. This isn’t going tobe true for any series, each series has its best $\alpha$ (orseveral). The process of finding the best $\alpha$ is referred to asfitting and we will discuss it later separately.
Quick Review
We’ve learned some history, basic terminology (series and how it knowsno time, method, error SSE, MSE and fitting). And we’ve learned somebasic forecasting methods: naive, simple average, moving average,weighted moving average and, finally, single exponential smoothing.
One very important characteristic of all of the above methods is thatremarkably, they can only forecast a single point. That’s correct,just one.
In Part II we will focus on methods that can forecast more thanone point.
FAQs
What is Holt Winters forecasting method? ›
The HoltWinters method uses exponential smoothing to encode lots of values from the past and use them to predict “typical” values for the present and future. Exponential smoothing refers to the use of an exponentially weighted moving average (EWMA) to “smooth” a time series.
Which is better Holt Winters or ARIMA? ›
From the RMSE values above, the ARIMA fits the training data better than the Holt Winters, but the Holt Winters gives a much more accurate forecast on testing data.
What does alpha Beta and gamma mean in Holt Winters? ›
A HoltWinters model is defined by its three order parameters, alpha, beta, gamma. Alpha specifies the coefficient for the level smoothing. Beta specifies the coefficient for the trend smoothing. Gamma specifies the coefficient for the seasonal smoothing.
Does Holt Winters need stationary data? ›
Exponential smoothing methods including HoltWinters methods are appropriate for (some kinds of) nonstationary data. In fact, they are only really appropriate if the data are nonstationary. Using an exponential smoothing method on stationary data is not wrong but is suboptimal.
What are different methods of forecasting? ›
Technique  Use 

1. Straight line  Constant growth rate 
2. Moving average  Repeated forecasts 
3. Simple linear regression  Compare one independent with one dependent variable 
4. Multiple linear regression  Compare more than one independent variable with one dependent variable 
Is ETS same as Holt Winters? ›
First, HoltWinters, or Triple Exponential Smoothing, is a sibling of ETS. If you understand HoltWinters, then you will easily be able to understand the most powerful prediction method for time series data (among the methods above). Second, you can use HoltWinters out of the box with InfluxDB.
Is ARIMA better than exponential smoothing? ›
I found the only difference between ARIMA and Exponential smoothing model is the weight assignment procedure to its past lag values and error term. In that case Exponential should be considered much better that ARIMA due to its weight assigning method.
What is triple exponential smoothing? ›
Triple exponential smoothing is used to handle the time series data containing a seasonal component. This method is based on three smoothing equations: stationary component, trend, and seasonal. Both seasonal and trend can be additive or multiplicative.
What does alpha mean in Holt Winters? ›
the alpha parameter of the HoltWinters filter. Specifies how to smooth the level component. If numeric, it must be within the halfopen unit interval (0, 1]. A small value means that older values in x are weighted more heavily. Values near 1.0 mean that the latest value has more weight.
What is beta in forecasting? ›
beta (β) — Smoothing parameter for the trend component of the forecast. The value of beta can be any number between 0 and 1, not inclusive. • gamma (γ) — Smoothing parameter for the seasonality component of the forecast. The value of gamma can be any number between 0 and 1, not inclusive.
What does alpha mean in forecasting? ›
The value of 𝜶(alpha) lies between 0 to 1 such that; 𝜶(alpha)=0: signifies that future forecasted values are the average of historical data (giving more weights to historical data) 𝜶(alpha)=1: signifies that future forecast values are the results of the recent observation (giving more weights to recent observations).
What is the main drawback in applying the Holt Winters method in practice? ›
Limitations of HoltWinter's Technique
One major limitation of this algorithm is the multiplicative feature of the seasonality. The issue of multiplicative seasonality is how the model performs when we have time frames with very low amounts.
What is the difference between Holt Winters additive and multiplicative? ›
The additive method is preferred when the seasonal variations are roughly constant through the series, while the multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series.
Which one of the given options is a measure of overall forecasting errors? ›
Mean absolute deviation (MAD) is the mean or average value of forecast error for a certain period without considering sign. It tells about the absolute magnitude of forecast error for the given period.
What algorithm does Excel use for forecasting? ›
The forecast predicts future values using your existing timebased data and the AAA version of the Exponential Smoothing (ETS) algorithm. The table can contain the following columns, three of which are calculated columns: Historical time column (your timebased data series)
How do you create a forecasting model in Excel? ›
In newer versions of Excel (i.e., Excel 2016 onwards), go to the Data menu and select Forecast Sheet. Then pick a suitable chart (line charts and column charts are best) and pick an end forecast date. Finally, click Create to generate a worksheet with your sales forecast.
What are the 3 forecasting techniques? ›
There are three basic types—qualitative techniques, time series analysis and projection, and causal models.
What is the best tool for forecasting? ›
 Pipedrive.
 Anaplan.
 SPOTIO.
 Gong.io.
 Workday Adaptive Planning.
 InsightSquared.
 Aviso Insights.
What are the 7 steps in the forecasting system? ›
 Determine what the forecast is for.
 Select the items for the forecast.
 Select the time horizon. Interested in learning more? ...
 Select the forecast model type.
 Gather data to be input into the model.
 Make the forecast.
 Verify and implement the results.
What is the difference between Holt and Holt Winters? ›
Holt: Exponential smoothing with a trend component, i.e double exponential smoothing. HoltWinters: Exponential smoothing with a trend component and a seasonal component, i.e. triple exponential smoothing.
What is Holt's model? ›
Holt's twoparameter model, also known as linear exponential smoothing, is a popular smoothing model for forecasting data with trend. Holt's model has three separate equations that work together to generate a final forecast.
What does ETS model stand for? ›
ETS (Error, Trend, Seasonal) method is an approach method for forecasting time series univariate. This ETS model focuses on trend and seasonal components [7]. The flexibility of the ETS model lies. in its ability to trend and seasonal components of different traits.
Why is exponential smoothing better than moving average? ›
Whereas in Moving Averages the past observations are weighted equally, Exponential Smoothing assigns exponentially decreasing weights as the observation get older. In other words, recent observations are given relatively more weight in forecasting than the older observations.
Which is the equivalent Arima model for Holt's exponential smoothing method? ›
An equivalent ARIMA(0,1,1) model can be constructed to represent the single exponential smoother. Double exponential smoothing (also called Holt's method) smoothes the data when a trend is present.
What is the difference between ARIMA and Sarima? ›
ARIMA and SARIMA are both algorithms for forecasting. ARIMA takes into account the past values (autoregressive, moving average) and predicts future values based on that. SARIMA similarly uses past values but also takes into account any seasonality patterns.
What are the five steps for forecasting? ›
 Step 1: Problem definition.
 Step 2: Gathering information.
 Step 3: Preliminary exploratory analysis.
 Step 4: Choosing and fitting models.
 Step 5: Using and evaluating a forecasting model.
What is the difference between Moving Averages and exponential smoothing? ›
The primary difference between an
Why do we use simple exponential smoothing? ›
The simplest of the exponentially smoothing methods is naturally called simple exponential smoothing (SES)^{13}. This method is suitable for forecasting data with no clear trend or seasonal pattern. For example, the data in Figure 7.1 do not display any clear trending behaviour or any seasonality.
What is double exponential smoothing? ›
Double exponential smoothing employs a level component and a trend component at each period. Double exponential smoothing uses two weights, (also called smoothing parameters), to update the components at each period.
What is single exponential smoothing? ›
Single Exponential Smoothing, SES for short, also called Simple Exponential Smoothing, is a time series forecasting method for univariate data without a trend or seasonality. It requires a single parameter, called alpha (a), also called the smoothing factor or smoothing coefficient.
How do you find the alpha parameter? ›
alpha = (M^2)/[M2  M^2];
What is the best period to estimate beta? ›
A common procedure is to use a beta estimated over the most recently available fiveyear period. For future reference we call this the “fiveyear rule of thumb”.
What is cumulative forecast error? ›
Cumulative error is the error that occurs in an equation or estimation over time.
Why do financial analysts use monthly data when calculating beta? ›
Beta coefficients have in the past generally been estimated using monthly returns, mainly because these data were the most readily available. 2 To day, betas may be estimated using weekly or even daily returns.
What is a smoothing factor? ›
The controlling input of the exponential smoothing calculation is known as the smoothing factor (also called the smoothing constant). It essentially represents the weighting applied to the most recent period's demand.
Is a higher or lower alpha value provide a smoother forecast? ›
This value determines the degree of smoothing by changing how quickly the level component adjusts to the most recent data. Alpha values can range from 0 to 1, inclusive. Lower values produce smoother fitted lines because they give more weight to past observations, averaging out fluctuations over time.
How do you smooth out seasonality? ›
When there is a seasonal pattern in your data and you want to remove it, set the length of your moving average to equal the pattern's length. If there is no seasonal pattern in your data, choose a length that makes sense. Longer lengths will produce smoother lines.
When implementing Holt Winters forecasting method we disaggregate past demand observations into? ›
When implementing HoltWinter's forecasting method, we disaggregate past demand observations into... base level, trend, and seasonal components. it can forecast demand for more than one time period into the future.
What is exponential smoothing method? ›
Exponential smoothing is a rule of thumb technique for smoothing time series data using the exponential window function. Whereas in the simple moving average the past observations are weighted equally, exponential functions are used to assign exponentially decreasing weights over time.
How is seasonal forecast calculated? ›
You can forecast monthly sales by multiplying your estimated sales for next year by the seasonal index for each month. Or you can estimate a 12month trend for your deseasonalized sales and then apply the seasonal index to forecast your actual sales amounts.
What is the exponential smoothing formula? ›
The exponential smoothing calculation is as follows: The most recent period's demand multiplied by the smoothing factor. The most recent period's forecast multiplied by (one minus the smoothing factor). S = the smoothing factor represented in decimal form (so 35% would be represented as 0.35).
What is the difference between Holt Winters additive and multiplicative? ›
The additive method is preferred when the seasonal variations are roughly constant through the series, while the multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series.
Is ETS same as Holt Winters? ›
First, HoltWinters, or Triple Exponential Smoothing, is a sibling of ETS. If you understand HoltWinters, then you will easily be able to understand the most powerful prediction method for time series data (among the methods above). Second, you can use HoltWinters out of the box with InfluxDB.
What are the key components of a demand forecast strategy? ›
To be successful, demand forecasting for a supply chain should include these components: Clean, reliable data, including historical data and trend projections. Actionable inputs, including from sales team members, outside experts, and market research. Robust supply chain analytics.
Which is better moving average or exponential smoothing? ›
For a given average age (i.e., amount of lag), the simple exponential smoothing (SES) forecast is somewhat superior to the simple moving average (SMA) forecast because it places relatively more weight on the most recent observationi.e., it is slightly more "responsive" to changes occuring in the recent past.
What are the five steps for forecasting? ›
 Step 1: Problem definition.
 Step 2: Gathering information.
 Step 3: Preliminary exploratory analysis.
 Step 4: Choosing and fitting models.
 Step 5: Using and evaluating a forecasting model.
When should you use exponential smoothing? ›
Exponential smoothing is a way to smooth out data for presentations or to make forecasts. It's usually used for finance and economics. If you have a time series with a clear pattern, you could use moving averages — but if you don't have a clear pattern you can use exponential smoothing to forecast.
What is the best forecasting method for seasonal data? ›
Damped Trend Multiplicative Seasonal Method
This method is best for data with a trend and with seasonality. It results in a curved forecast that flattens over time and reporoduces the seasonal cycles.
How do you forecast seasonal data in Excel? ›
SEASONALITY function in Excel can be used to predict seasonal trends in a data set. The function takes four arguments: the data set, the number of periods for the prediction, the period for the prediction, and the type of seasonality. The function can be used to predict trends for monthly, quarterly, or annual data.
How do you calculate seasonality in Excel? ›
Enter the following formula into cell C2: "=B2 / B$15" omitting the quotation marks. This will divide the actual sales value by the average sales value, giving a seasonal index value.
How do you do exponential smoothing by hand? ›
Forecasting: Exponential Smoothing, MSE  YouTube
What alpha should I use for exponential smoothing? ›
Start with an alpha between 0.2 and 0.5 and see how it fits your data. Set it higher to fit changes in the data more closely. Set it lower to emphasize longer term trend.
What is the best alpha for exponential smoothing? ›
We choose the best value for \alpha so the value which results in the smallest MSE. The sum of the squared errors (SSE) = 208.94. The mean of the squared errors (MSE) is the SSE /11 = 19.0. The MSE was again calculated for \alpha = 0.5 and turned out to be 16.29, so in this case we would prefer an \alpha of 0.5.