FORECASTING VOLATILITY OF STOCK RETURNS WITH ARCH
FORECASTING-PROJECT REPORT
SIMON MALZER (A0406398), DANIEL TSCHOPP (A0309556), DAVID ZENZ (A9271378)
Abstract. The purpose of this project paper is to report the implementation of some of the
in class presented statistical models and forecasting procedures. We started with the idea of
employing an autoregressive conditional heteroscedasticity (ARCH) model for the analysis of
stock return data and generate forecasts of the volatility of this high-frequency financial data
series. In the following we present the underlying model, provide tests for model selection,
compute the selected model and conduct forecasts.
1. Introduction
An often observed idiosyncrasy of financial time series is what is referred to as volatility
clustering. Big shocks tend to be followed by big shocks in either direction and small shocks
tend to imply small shocks. These patterns can be captured adequately by ARCH models.
As volatility is considered a measure of risk, modelling and forecasting volatility is a key issue
to risk-averse investors.
In this paper, we apply ARCH models to stock returns gained from ultra-high-frequency
AMAZON transaction data in order to forecast volatility. We do this by (1) testing the
data for ARCH effects, (2) determine the lag order by the PACF and information criteria,
(3) fit the model to the data using Maximum Likelihood estimation, (4) evaluate forecasting
perfomance for different time horizons, and (5) compare models with varying lag order. The
duration between two transactions will not be considered, neither with regard to returns (the
duration between two transactions is set to one irrespective of the actual duration) nor with
regard to modelling and forecasting volatility.
The structure of this report is as follows. Section 2 introduces the model, followed by the data
description in section 3. Model selection and estimation methods are dealt with in section 4.1
2 SIMON MALZER (A0406398), DANIEL TSCHOPP (A0309556), DAVID ZENZ (A9271378)
Empirical results and forecasts are presented and discussed in section 5; the final section 6
summarizes our findings.
2. The Model
Autoregressive conditional heteroscedasticity (ARCH) models provide a systematic framework
to model volatility and were first introduced by Engle (1982). We employ an ARCH model
of the following form
rt = µ+ εt
εt = σtγt
γt ∼ IID(0, 1)
σ2t = ω +
q∑j=1
αjε2t−j
(1)
where rt denotes the logarithmic stock returns as obtained from the data by (2), µ the
expected return and εt the error term. The error terms are split into a time-dependent
standard deviation σt and the stochastic part γt, which is a sequence of independent and
identically distributed random variables with mean zero and variance 1.
The assumptions imply εt|It−1 ∼ D(0, σ2t ), constant expected conditional returns E{rt|It−1} =
µ (as proposed by theory) and the conditional variance σ2t to be dependent on past volatility
shocks. The only observed variable of this model are the returns rt, thus apart from the
parameters, the volatility has to be estimated from data as well.
3. The data
The data used in this project report is taken from LOBSTER (Limit Order Book System -
The Efficient Reconstructor), an online limit order book data tool, developed and run by a
group of financial econometricians affiliated with Humboldt University Berlin, Germany and
the University of Vienna, Austria. LOBSTER provides a ’message’ and an ’order book’ file
which contain data about the type of an event, price range, time, etc. as can be seen in
Table 1 on page 3, where
FORECASTING VOLATILITY OF STOCK RETURNS WITH ARCH FORECASTING-PROJECT REPORT 3
LOBSTER NASDAQ data for Amazon
time event type order ID size price direction... ... ... ... ... ...
34200.19 1 11885113 21 2238100 134200.19 1 3911376 20 2239600 -134200.19 1 11534792 100 2237500 134200.19 1 1365373 13 2240000 -134200.19 1 11474176 2 2236500 134200.19 1 1847685 100 2240000 -1
... ... ... ... ... ...
Table 1. Sample output of LOBSTER ’message’ file
time: is measured in seconds after midnight with decimal precision of at least millisec-
onds and up to nanoseconds depending on the period requested
event type: is subdivided by
1: Submission of a new limit order
2: Cancellation (partial deletion of a limit order)
3: Deletion (total deletion of a limit order)
4: Execution of a visible limit order
5: Execution of a hidden limit order
7: Trading halt indicator
order ID: is the unique order reference number
size: is the quantity of shares
price: is the price in Dollar times 10.000
direction: is subdivided by
-1: Sell limit order
1: Buy limit order
4 SIMON MALZER (A0406398), DANIEL TSCHOPP (A0309556), DAVID ZENZ (A9271378)
time size price log price squared log priceMin 34200 1.00 220.5 -1.470e-03 0.000e+001st Quantile 39200 20.00 221.1 -4.251e-05 5.000e-12Median 47821 100.00 222.5 2.232e-06 2.184e-09Mean 47179 93.05 222.5 0.000e+00 3.228e-083rd Quantile 55282 100.00 223.9 4.732e-05 1.855e-08Max. 57600 4018.00 226.0 1.387e-03 2.162e-06
Table 2. Descriptive statistics of dataset
3.1. The Sample. The available sample files from the LOBSTER project contains data from
Amazon stock trades on June 26, 2012 in a period about 6.5 hours starting at 9.30am until
4pm based on the official NASDAQ Historical TotalView-ITCH-sample. Whereas the whole
dataset contains 269.747 observations we just focused on realized transactions (event type 4
- execution of a visible limit order). For timepoints with more than one realized transaction
we averaged over the observed prices, using the order sizes as weights and dropped the other
observations from our dataset. Thus we obtained a time series consisting of prices (pt)nt=1.
For t < n we then calculated logarithmic returns by
(2) rt = log
(pt+1
pt
)which led to a total of 6.590 observations. A plot of this return series is shown in the Appendix
(see Figure 7 on 11).
3.2. Descriptive and Explorative Analysis. When looking at the development of the
intraday price of the Amazon stock, one can see a decreasing trend with clusters in the data
curve (see Figure 1 on page 5). Table 2 on page 4 reports the time interval of 6.5 hours,
starting at second 34200 (09.30 am) and ending at second 57600 (04.00 pm). The price (price
in dollar times 10.000) ranges from $ 220.5 (lowest) to $ 226 (highest), where the mean and
median price is $ 222.5 per stock. When plotting the histogram of log returns together with
the normal distribution (see Appendix, Figure 6 on page 10) one obtains a likewise distribution
following the normal.
FORECASTING VOLATILITY OF STOCK RETURNS WITH ARCH FORECASTING-PROJECT REPORT 5
0 1000 2000 3000 4000 5000 6000
221
222
223
224
225
226
transaction
pric
e
Figure 1. Development of Price over time
0 10 20 30
0.0
0.2
0.4
0.6
0.8
1.0
Lag
AC
F
log returns
(a) ACF - log returns
0 10 20 30−0.
04−
0.02
0.00
0.02
0.04
Lag
Par
tial A
CF
log returns
(b) PACF - log returns
Figure 2. ACF and PACF of log returns
6 SIMON MALZER (A0406398), DANIEL TSCHOPP (A0309556), DAVID ZENZ (A9271378)
0 10 20 30
0.0
0.2
0.4
0.6
0.8
1.0
Lag
AC
F
squared log returns
(a) ACF - squared log returns
0 10 20 30
−0.
020.
000.
020.
040.
060.
080.
10
Lag
Par
tial A
CF
squared log returns
(b) PACF - squared log returns
Figure 3. ACF and PACF of squared log returns
statistic df p-valueBox-Ljung 9.4134 1 0.002154
Table 3. Box-Ljung test results
4. Model selection
4.1. Testing for ARCH-effects. Because volatility clustering is, as mentioned above, usu-
ally a property of financial time series data, we tested first for so called ARCH-effects. The
ACF of the log return series (see Figure 2 on page 5) indicates very low serial correlation,
which is in line with a not significant Box-Ljung test-statistic of this series. For the squared
log return series (see Figure 3 on page 6) the suggestion of almost no serial correlation does
not hold any more. Clearly, as shown by the ACF and PACF, there is some correlation in the
squared log return series, which is confirmed by a significant Box-Ljung test-statistic. The
Box-Ljung Q(m) statistic can be used to test weather the first m lags of the ACF are zero.
The resulting test-statistic for a one lagged test, corresponding critical value and p-value can
be seen in Table 3 on page 6. The test-statistic and the resulting p-value indicate that the
H0 of the first lag of the ACF to be zero can clearly be rejected.
FORECASTING VOLATILITY OF STOCK RETURNS WITH ARCH FORECASTING-PROJECT REPORT 7
statistic df p-valueEngle (3-lags) 24.57 3 8.332e-16Engle (12-lags) 17.55 10 2.2e-16
Table 4. Engle test results
The second statistical test that was used to check for ARCH effects was a Lagranche-multiplier
test as used by Engle (Engle 1982), which is another test for conditional heteroskedasticity.
This test conducts an F-test on the regression coefficients of a regression of the squared errors
on its lags to be jointly zero. The findings for different lag orders are summarized in Table 4
on page 7. As the p-value for both lag orders is close to zero the H0 can be rejected indicating
again the presents of ARCH-effects.
4.2. Determining the lag order. To determine the lag order of the to-be-fitted ARCH
model several approaches were used. The PACF showed persistent correlation, but some
breaks with close to zero partial autocorrelation at some lags. One of these breaks was at
the 4-th lag, another at the 12-th lag and some more from 16 onwards (see Figur 2 on page
5).
Because no clear indication was given by the PACF plot, we based the selection of the ap-
propriate lag order on information criteria. The Akaike information criteria (AIC), which
is usually utilized to determine the best model for forecasting, suggested three lags as the
appropriate lag order for our ARCH-model. Because this result confirmed the visual analysis
of the PACF, which showed the first three lags of the PACF as significantly different from
zero, the ARCH(3) model was taken for our further investigation, especially our forecast of
future volatility, which will be provided in the next section.
5. Empirical Results and Forecast
5.1. Empirical Results. The obtained fitted ARCH(3) model is showen in (3), for which
the assumption of γt distributed IID(0,1) is employed. It can be seen that the value for µ
is not significant at any significance level, but α0, α1 and α3 are highly significant and α2 is
significant at least at the 10% level.
8 SIMON MALZER (A0406398), DANIEL TSCHOPP (A0309556), DAVID ZENZ (A9271378)
rt = −2.562 ∗ 10−06 + εt
σ2t = 2.4408 ∗ 10−08 + 1.3987 ∗ 10−01ε2t−1 + 3.0773 ∗ 10−02ε2t−2 + 9.7839 ∗ 10−02ε2t−3
(3)
5.2. Forecast. The ARCH(3) model was used to achieve the forecast for volatility of the
underlying Amazone return series. Forecasts by an ARCH volatility model are created by
recursive substitution as with AR models. An insample forecast for the last 100 observations,
obtained by our ARCH(3) model, is shown in Figure 4 on 9. Volatility is predicted to stay
within a narrow boundary, mainly because it was declining over the whole series of the used
subsample and therefore lower at the end.
In order to get a preliminary insight in the forecasting ability of our model, the end of the
used time series and therefore also the actually realized values for the ”forecasted” interval is
shown in Figure 8 on 11. It can be seen that the actual volatility is getting stronger again at
the very end of the series, meaning that deviation from the mean get amplified. Nevertheless
some measure would have to be employed to capture these effects and verify the obtained
forecast in a meaningfull way. For example the forecasting error, or some similar measure
of forecasting accuracy, would be needed to correctly evaluate the forecasting ability of our
model.
6. Conclusions
The ARCH(3) model employed in this forecasting project seems to fit the data quite well.
Altought the mean equation is trivial, volatility is captured well as the intercept and two of
three lags are statistically significant at the 1% level and the second lag at least at the 10%
level.
Regarding to forecasting ability the findings of this forecasting project are not unambiguous.
Other models would have to be fit to our underlying series and there forecasts compared with
the received ARCH(3) forecasts. Different types of GARCH models first come to mind when
for other types of models is looked out, because a well known shortfall of ARCH model is the
symmetric treatment of volatility shocks, which is unrealistic regarding to asset return series.
Further attempts of modeling our time series therefore seem necessary in order to reasonably
evaluate the forecasting performance of the employed ARCH(3) model.
FORECASTING VOLATILITY OF STOCK RETURNS WITH ARCH FORECASTING-PROJECT REPORT 9
0 500 1000 1500
−5e
−04
0e+
005e
−04
Index
x
Prediction with confidence intervals
X̂t+h
X̂t+h − 1.96 MSE
X̂t+h + 1.96 MSE
Figure 4. in-sample forecast of log returns
10 SIMON MALZER (A0406398), DANIEL TSCHOPP (A0309556), DAVID ZENZ (A9271378)
7. Appendix22
122
222
322
422
522
6
price
(a) Boxplot - price
−4 −2 0 2 4
221
222
223
224
225
226
price
Theoretical Quantiles
Sam
ple
Qua
ntile
s
(b) Q-Q Plot - price
Figure 5. Boxplot and Q-Q Plot of price
log returns with normal distribution
log returns
Fre
quen
cy
−1e−03 −5e−04 0e+00 5e−04 1e−03
010
0020
0030
0040
00
Figure 6. Histogram of log returns
FORECASTING VOLATILITY OF STOCK RETURNS WITH ARCH FORECASTING-PROJECT REPORT 11
0 1000 2000 3000 4000 5000 6000−0.
0015
0.00
05
Whole series
index
retu
rn
Figure 7. Full log return series of Amazon stock
0 500 1000 1500
−5e
−04
5e−
04
Actual series without prediction
Index
x
Figure 8. End of the actual time series
12 SIMON MALZER (A0406398), DANIEL TSCHOPP (A0309556), DAVID ZENZ (A9271378)
8. References
Torben Andersen et. al.: ”Handbook of Financial Time Series”, Springer-Verlag Berlin
Heidelberg, 2009
Robert Engle: ”Autoregressive Conditional Heteroscedasticity with Estimates of the
Variance of United Kingdom Inflation”, Econometris, The Econometric Society, Vol.
50, No. 4, July 1982, pp. 987-1007
Robert Engle: ”The Econometrics of Ultra-High-Frequency Data”, Econometrica, Vol.
68, No. 1, January 2000, pp. 1-22
Marno Verbeek: ”A Guide to Modern Econometrics”, John Wiley & Sons, 2nd edi-
tion, 2004