Project Report - FORECASTING VOLATILITY OF STOCK RETURNS WITH ARCH FORECASTING-PROJECT REPORT

FORECASTING VOLATILITY OF STOCK RETURNS WITH ARCH

FORECASTING-PROJECT REPORT

SIMON MALZER (A0406398), DANIEL TSCHOPP (A0309556), DAVID ZENZ (A9271378)

Abstract. The purpose of this project paper is to report the implementation of some of the

in class presented statistical models and forecasting procedures. We started with the idea of

employing an autoregressive conditional heteroscedasticity (ARCH) model for the analysis of

stock return data and generate forecasts of the volatility of this high-frequency financial data

series. In the following we present the underlying model, provide tests for model selection,

compute the selected model and conduct forecasts.

1. Introduction

An often observed idiosyncrasy of financial time series is what is referred to as volatility

clustering. Big shocks tend to be followed by big shocks in either direction and small shocks

tend to imply small shocks. These patterns can be captured adequately by ARCH models.

As volatility is considered a measure of risk, modelling and forecasting volatility is a key issue

to risk-averse investors.

In this paper, we apply ARCH models to stock returns gained from ultra-high-frequency

AMAZON transaction data in order to forecast volatility. We do this by (1) testing the

data for ARCH effects, (2) determine the lag order by the PACF and information criteria,

(3) fit the model to the data using Maximum Likelihood estimation, (4) evaluate forecasting

perfomance for different time horizons, and (5) compare models with varying lag order. The

duration between two transactions will not be considered, neither with regard to returns (the

duration between two transactions is set to one irrespective of the actual duration) nor with

regard to modelling and forecasting volatility.

The structure of this report is as follows. Section 2 introduces the model, followed by the data

description in section 3. Model selection and estimation methods are dealt with in section 4.1

2 SIMON MALZER (A0406398), DANIEL TSCHOPP (A0309556), DAVID ZENZ (A9271378)

Empirical results and forecasts are presented and discussed in section 5; the final section 6

summarizes our findings.

2. The Model

Autoregressive conditional heteroscedasticity (ARCH) models provide a systematic framework

to model volatility and were first introduced by Engle (1982). We employ an ARCH model

of the following form

rt = µ+ εt

εt = σtγt

γt ∼ IID(0, 1)

σ2t = ω +

q∑j=1

αjε2t−j

(1)

where rt denotes the logarithmic stock returns as obtained from the data by (2), µ the

expected return and εt the error term. The error terms are split into a time-dependent

standard deviation σt and the stochastic part γt, which is a sequence of independent and

identically distributed random variables with mean zero and variance 1.

The assumptions imply εt|It−1 ∼ D(0, σ2t ), constant expected conditional returns E{rt|It−1} =

µ (as proposed by theory) and the conditional variance σ2t to be dependent on past volatility

shocks. The only observed variable of this model are the returns rt, thus apart from the

parameters, the volatility has to be estimated from data as well.

3. The data

The data used in this project report is taken from LOBSTER (Limit Order Book System -

The Efficient Reconstructor), an online limit order book data tool, developed and run by a

group of financial econometricians affiliated with Humboldt University Berlin, Germany and

the University of Vienna, Austria. LOBSTER provides a ’message’ and an ’order book’ file

which contain data about the type of an event, price range, time, etc. as can be seen in

Table 1 on page 3, where

FORECASTING VOLATILITY OF STOCK RETURNS WITH ARCH FORECASTING-PROJECT REPORT 3

LOBSTER NASDAQ data for Amazon

time event type order ID size price direction... ... ... ... ... ...

34200.19 1 11885113 21 2238100 134200.19 1 3911376 20 2239600 -134200.19 1 11534792 100 2237500 134200.19 1 1365373 13 2240000 -134200.19 1 11474176 2 2236500 134200.19 1 1847685 100 2240000 -1

... ... ... ... ... ...

Table 1. Sample output of LOBSTER ’message’ file

time: is measured in seconds after midnight with decimal precision of at least millisec-

onds and up to nanoseconds depending on the period requested

event type: is subdivided by

1: Submission of a new limit order

2: Cancellation (partial deletion of a limit order)

3: Deletion (total deletion of a limit order)

4: Execution of a visible limit order

5: Execution of a hidden limit order

7: Trading halt indicator

order ID: is the unique order reference number

size: is the quantity of shares

price: is the price in Dollar times 10.000

direction: is subdivided by

-1: Sell limit order

1: Buy limit order


time size price log price squared log priceMin 34200 1.00 220.5 -1.470e-03 0.000e+001st Quantile 39200 20.00 221.1 -4.251e-05 5.000e-12Median 47821 100.00 222.5 2.232e-06 2.184e-09Mean 47179 93.05 222.5 0.000e+00 3.228e-083rd Quantile 55282 100.00 223.9 4.732e-05 1.855e-08Max. 57600 4018.00 226.0 1.387e-03 2.162e-06

Table 2. Descriptive statistics of dataset

3.1. The Sample. The available sample files from the LOBSTER project contains data from

Amazon stock trades on June 26, 2012 in a period about 6.5 hours starting at 9.30am until

4pm based on the official NASDAQ Historical TotalView-ITCH-sample. Whereas the whole

dataset contains 269.747 observations we just focused on realized transactions (event type 4

- execution of a visible limit order). For timepoints with more than one realized transaction

we averaged over the observed prices, using the order sizes as weights and dropped the other

observations from our dataset. Thus we obtained a time series consisting of prices (pt)nt=1.

For t < n we then calculated logarithmic returns by

(2) rt = log

(pt+1

pt

)which led to a total of 6.590 observations. A plot of this return series is shown in the Appendix

(see Figure 7 on 11).

3.2. Descriptive and Explorative Analysis. When looking at the development of the

intraday price of the Amazon stock, one can see a decreasing trend with clusters in the data

curve (see Figure 1 on page 5). Table 2 on page 4 reports the time interval of 6.5 hours,

starting at second 34200 (09.30 am) and ending at second 57600 (04.00 pm). The price (price

in dollar times 10.000) ranges from $ 220.5 (lowest) to $ 226 (highest), where the mean and

median price is $ 222.5 per stock. When plotting the histogram of log returns together with

the normal distribution (see Appendix, Figure 6 on page 10) one obtains a likewise distribution

following the normal.


0 1000 2000 3000 4000 5000 6000

221

222

223

224

225

226

transaction

pric

e

Figure 1. Development of Price over time

0 10 20 30

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

log returns

(a) ACF - log returns

0 10 20 30−0.

04−

0.02

0.00

0.02

0.04

Lag

Par

tial A

CF

log returns

(b) PACF - log returns

Figure 2. ACF and PACF of log returns


0 10 20 30

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

squared log returns

(a) ACF - squared log returns

0 10 20 30

−0.

020.

000.

020.

040.

060.

080.

10

Lag

Par

tial A

CF

squared log returns

(b) PACF - squared log returns

Figure 3. ACF and PACF of squared log returns

statistic df p-valueBox-Ljung 9.4134 1 0.002154

Table 3. Box-Ljung test results

4. Model selection

4.1. Testing for ARCH-effects. Because volatility clustering is, as mentioned above, usu-

ally a property of financial time series data, we tested first for so called ARCH-effects. The

ACF of the log return series (see Figure 2 on page 5) indicates very low serial correlation,

which is in line with a not significant Box-Ljung test-statistic of this series. For the squared

log return series (see Figure 3 on page 6) the suggestion of almost no serial correlation does

not hold any more. Clearly, as shown by the ACF and PACF, there is some correlation in the

squared log return series, which is confirmed by a significant Box-Ljung test-statistic. The

Box-Ljung Q(m) statistic can be used to test weather the first m lags of the ACF are zero.

The resulting test-statistic for a one lagged test, corresponding critical value and p-value can

be seen in Table 3 on page 6. The test-statistic and the resulting p-value indicate that the

H0 of the first lag of the ACF to be zero can clearly be rejected.


statistic df p-valueEngle (3-lags) 24.57 3 8.332e-16Engle (12-lags) 17.55 10 2.2e-16

Table 4. Engle test results

The second statistical test that was used to check for ARCH effects was a Lagranche-multiplier

test as used by Engle (Engle 1982), which is another test for conditional heteroskedasticity.

This test conducts an F-test on the regression coefficients of a regression of the squared errors

on its lags to be jointly zero. The findings for different lag orders are summarized in Table 4

on page 7. As the p-value for both lag orders is close to zero the H0 can be rejected indicating

again the presents of ARCH-effects.

4.2. Determining the lag order. To determine the lag order of the to-be-fitted ARCH

model several approaches were used. The PACF showed persistent correlation, but some

breaks with close to zero partial autocorrelation at some lags. One of these breaks was at

the 4-th lag, another at the 12-th lag and some more from 16 onwards (see Figur 2 on page

5).

Because no clear indication was given by the PACF plot, we based the selection of the ap-

propriate lag order on information criteria. The Akaike information criteria (AIC), which

is usually utilized to determine the best model for forecasting, suggested three lags as the

appropriate lag order for our ARCH-model. Because this result confirmed the visual analysis

of the PACF, which showed the first three lags of the PACF as significantly different from

zero, the ARCH(3) model was taken for our further investigation, especially our forecast of

future volatility, which will be provided in the next section.

5. Empirical Results and Forecast

5.1. Empirical Results. The obtained fitted ARCH(3) model is showen in (3), for which

the assumption of γt distributed IID(0,1) is employed. It can be seen that the value for µ

is not significant at any significance level, but α0, α1 and α3 are highly significant and α2 is

significant at least at the 10% level.


rt = −2.562 ∗ 10−06 + εt

σ2t = 2.4408 ∗ 10−08 + 1.3987 ∗ 10−01ε2t−1 + 3.0773 ∗ 10−02ε2t−2 + 9.7839 ∗ 10−02ε2t−3

(3)

5.2. Forecast. The ARCH(3) model was used to achieve the forecast for volatility of the

underlying Amazone return series. Forecasts by an ARCH volatility model are created by

recursive substitution as with AR models. An insample forecast for the last 100 observations,

obtained by our ARCH(3) model, is shown in Figure 4 on 9. Volatility is predicted to stay

within a narrow boundary, mainly because it was declining over the whole series of the used

subsample and therefore lower at the end.

In order to get a preliminary insight in the forecasting ability of our model, the end of the

used time series and therefore also the actually realized values for the ”forecasted” interval is

shown in Figure 8 on 11. It can be seen that the actual volatility is getting stronger again at

the very end of the series, meaning that deviation from the mean get amplified. Nevertheless

some measure would have to be employed to capture these effects and verify the obtained

forecast in a meaningfull way. For example the forecasting error, or some similar measure

of forecasting accuracy, would be needed to correctly evaluate the forecasting ability of our

model.

6. Conclusions

The ARCH(3) model employed in this forecasting project seems to fit the data quite well.

Altought the mean equation is trivial, volatility is captured well as the intercept and two of

three lags are statistically significant at the 1% level and the second lag at least at the 10%

level.

Regarding to forecasting ability the findings of this forecasting project are not unambiguous.

Other models would have to be fit to our underlying series and there forecasts compared with

the received ARCH(3) forecasts. Different types of GARCH models first come to mind when

for other types of models is looked out, because a well known shortfall of ARCH model is the

symmetric treatment of volatility shocks, which is unrealistic regarding to asset return series.

Further attempts of modeling our time series therefore seem necessary in order to reasonably

evaluate the forecasting performance of the employed ARCH(3) model.


0 500 1000 1500

−5e

−04

0e+

005e

−04

Index

x

Prediction with confidence intervals

X̂t+h

X̂t+h − 1.96 MSE

X̂t+h + 1.96 MSE

Figure 4. in-sample forecast of log returns


7. Appendix22

122

222

322

422

522

6

price

(a) Boxplot - price

−4 −2 0 2 4

221

222

223

224

225

226

price

Theoretical Quantiles

Sam

ple

Qua

ntile

s

(b) Q-Q Plot - price

Figure 5. Boxplot and Q-Q Plot of price

log returns with normal distribution

log returns

Fre

quen

cy

−1e−03 −5e−04 0e+00 5e−04 1e−03

010

0020

0030

0040

00

Figure 6. Histogram of log returns


0 1000 2000 3000 4000 5000 6000−0.

0015

0.00

05

Whole series

index

retu

rn

Figure 7. Full log return series of Amazon stock

0 500 1000 1500

−5e

−04

5e−

04

Actual series without prediction

Index

x

Figure 8. End of the actual time series


8. References

Torben Andersen et. al.: ”Handbook of Financial Time Series”, Springer-Verlag Berlin

Heidelberg, 2009

Robert Engle: ”Autoregressive Conditional Heteroscedasticity with Estimates of the

Variance of United Kingdom Inflation”, Econometris, The Econometric Society, Vol.

50, No. 4, July 1982, pp. 987-1007

Robert Engle: ”The Econometrics of Ultra-High-Frequency Data”, Econometrica, Vol.

68, No. 1, January 2000, pp. 1-22

Marno Verbeek: ”A Guide to Modern Econometrics”, John Wiley & Sons, 2nd edi-

tion, 2004

Date post:	30-Nov-2023
Category:	Documents
Upload:	wiiw
View:	0 times
Download:	0 times

Project Report - FORECASTING VOLATILITY OF STOCK RETURNS WITH ARCH FORECASTING-PROJECT REPORT

Documents