Geophys. J. Int. (2007) 169, 301–314 doi: 10.1111/j.1365-246X.2006.03314.x
GJI
Sei
smolo
gy
Bayesian inference of stress release models applied to some Italianseismogenic zones
R. Rotondi and E. VariniC.N.R. - Institute of Applied Mathematics and Information Technology, Via Bassini 15, 20133 Milano, Italy. E-mail: [email protected]
Accepted 2006 November 28. Received 2006 September 21; in original form 2006 February 13
S U M M A R Y
In this paper, we evaluate the seismic hazard of a region in southern Italy by analysing stress
release models from the Bayesian viewpoint; the data are drawn from the most recent version
of the parametric catalogue of Italian earthquakes. For estimation we just use the events up to
1992, then we forecast the date of the next event through a stochastic simulation method and we
compare the result with the really occurred shocks in the span 1993–2002. The original version
of the stress release model, proposed by Vere-Jones in 1978, transposes Reid’s elastic rebound
theory in the framework of stochastic point processes. Since the nineties enriched versions
of this model have appeared in the literature, applied to historical catalogues from China,
Iran, Japan; they envisage the identification of independent or interacting tectonic subunits
constituting the region under exam. It follows that the stress release models, designed for
regional analyses, are evolving towards studies on fault segments, realizing some degree of
convergence to those models that start from an individual fault and, considering the interactions
with nearby segments, are driven to studies on regional scale. The optimal performance of the
models we consider depends on a set of choices among which: the seismogenic region and
possible subzones, the threshold magnitude, the length of the time period. In this paper, we focus
our attention on the influence of the subdivision of the region under exam into tectonic units;
in the light of the recent studies on the fault segmentation model of Italy we propose a partition
of Sannio-Matese-Ofanto-Irpinia, one of the most seismically active region in southern Italy.
The results show that the performance of the stress release models improves in terms of both
fitting and forecasting when the region is split up into parts including new information about
potential seismogenic sources.
Key words: Bayesian inference, Markov chain Monte Carlo methods, seismogenic sources,
stochastic simulation, stress release models.
1 I N T RO D U C T I O N
In the community of multidisciplinary experts involved in studies on the seismic risk the awareness is growing that earthquake prediction
requires probabilistic methods and that the sufficiently extensive and flexible theoretical framework is provided by the theory of point
processes. The large amount of earthquake models proposed in the literature may be classified according to different criteria. As there is yet
no comprehensive model of earthquake occurrence, the formulation of models can be based on phenomenological analyses or on physical
hypotheses; this subdivision has generated two classes of models with contrasting features and opposite prospects of application. Examples of
the first class are the models which describe the temporal and spatial clustering of earthquakes (Ogata 1988; 1999; Kagan 1991, and references
therein). Offspring of the generalized Omori’s law, these models belong to the class of the self-exciting models from the probabilistic viewpoint;
their properties are studied in Hawkes (1971) and Hawkes & Oakes (1974). Clustering models, born to describe the decay of secondary shocks
following a strong earthquake, have been then applied for detecting anomalies in the seismic activity like precursory relative quiescence
(Matsu’ura 1986; Ogata 1997).The strong time asymmetry in the foreshock–main shock–aftershock sequences makes the prediction provided
by these models best just in the immediate aftermath of the main shock (Kagan 1991); hence, as their prediction lead time is very small,
they are useful when the damage mitigation policy is based on short-term strategies with low alarm-initiation cost (Kagan 1997). On the
other hand, every catalogue shows earthquake clustering; hence we can either avoid the issue by declustering the catalogue or allow for this
feature by inserting in the forecasting procedure a component which models this trend of the seismic activity. A big effort in this direction is
represented by the long- and short-term forecasting model currently tested on the northwest and southwest Pacific regions (Kagan & Jackson
2000) and in southern California (Kagan et al. 2003). The long-term, time-independent component of the hazard is obtained by smoothing
C© 2007 The Authors 301
Journal compilation C© 2007 RAS
302 R. Rotondi and E. Varini
catalogue data through kernel functions, whereas the short-term forecast is modelled by a branching process where the seeds of the clusters
follow a Poisson point process with a constant rate. Another approach is presented by Faenza et al. (2003, 2004) and Cinti et al. (2004); they
use a semi-parametric model, the Cox regression model, assuming that the hazard function at time t, time elapsed from the last event, depends
also on tectonic/physics factors besides the usual data of the catalogue. The application to Italian territory indicates a time clustering of the
large earthquakes for a few years, followed by a constant trend of the hazard. We recall that the Cox model, as defined in these papers, implies
two assumptions: the first that the hazard ratio for any two locations is constant in time, because it does not include the baseline function;
the second that independent covariates affect the hazard in a multiplicative way because of the use of the exponential function for linking the
covariates to the hazard.
Most of the models built on the recurrence physical hypothesis go back to the elastic rebound theory of Reid, among them the seismic
gap and the characteristic earthquake models (Nishenko 1991). According to these models large earthquakes are quasi-periodic; after a strong
shock the probability of a further large event is low, and it increases as time elapses. Moreover, the earthquake size distribution on ‘individual’
faults follows the ‘characteristic earthquake’ model, that is on each fault segment a large earthquake is exclusively characteristic. According
to Kagan (1991, p. 513) one of the reasons for the unsuccessful performance of these models is that they assume that all the slip is released
in characteristic earthquakes of the same size, while in fact considerable slip is released by both smaller and larger earthquakes. Due to
the stochastic nature of seismicity the relationship between strain accumulation and its release must be expressed in probabilistic terms as
is done, for instance, by the stress release (SR) models, translation of Reid’s theory in the framework of stochastic point processes. These
models were introduced by Vere-Jones in 1978 and then applied to historical catalogues of Japan, New Zealand, Iran and China, in both their
original form, we call simple, and subsequent versions enriched by geophysical information (Vere-Jones & Yonglu 1988; Zheng & Vere-Jones
1991; 1994). In these studies the models appear to fit, better than Poisson models, sets of large earthquakes occurred in large areas on long
timescales. Despite the intuitive physical assumptions supporting these recurrence models, we have to recognize that the catalogues testify
to the presence of clusters even of large events in contrast with the expected quiescence after a strong earthquake. A partial answer to this
issue comes from the version of the SR model named linked; it explains clusters of large events in terms of stress transfer among different,
possible closely related faults (Bebbington & Harte 2003, and references therein). An intermediate version between the simple and the
linked SR models is the one called independent where the region under exam is decomposed into different, not interacting seismogenic zones,
homogeneous from the seismotectonic viewpoint. Clusters of the original data set could be formed by events recorded in the same period in such
zones.
The aim of this paper is to analyse a particularly active region of southern Italy enriching the statistical modelling with recent geological
achievements in the understanding and characterization of Italian seismicity (Section 2), collected in the Database of Individual Seismogenic
Sources (DISS) (Valensise & Pantosti 2001b). As seismogenic source the authors intend a simplified and georeferenced 3-D representation of
a fault plane; each source was singly able to generate large-magnitude earthquakes in the past or is likely to experience a significant earthquake
in the future. We have then treated the sources located in southern Apennines as subunits of that seismogenic region. As data set we have used
the catalogue CPTI04 (CPTI Working Group 2004), the most recent version of the parametric catalogue of damaging earthquakes in the Italian
area. It is the result of the revision and extension of the previous catalogue CPTI99, in its turn, fusion of different catalogues, mainly NT4.1.1
(Camassi & Stucchi 1997) and CFTI 1–2 (Boschi et al. 1997), in the perspective of seismic hazard assessment. CPTI04 covers the period
from the Ancient World to 2002 and contains 2550 records of main shocks with epicentral intensity of I 0 ≥ V − V I degree of the Mercalli-
Cancani-Sieberg scale, or magnitude M S ≥ 4.0. The most remarkable developments concern the part related to the interval 1981–2002: (i) the
gaps in the time window 1981–1992 have been filled; (ii) empirical relationships have been estimated to obtain M S from the instrumentally
evaluated magnitude M L . The compilers have not included in the catalogue the events falling into time–space windows of 30 km and
±90 days, centred on main shocks. From the catalogue CPTI04 we have drawn the earthquakes occurred in Sannio-Matese-Ofanto-Irpinia
region in the time period 1650–1992 with magnitude M S ≥ 4.8. This magnitude threshold makes the events considered consistent with Reid’s
theory, that is, appropriate to be described by stress release models. In Section 3, we recall some versions of the SR model and we apply the
simple SR model on the entire data set, whereas the independent SR model is fitted on the set partitioned according to geological information.
Following the Bayesian paradigm we insert in the analysis information obtained from previous studies (Rotondi & Varini 2003a) through
assigning the prior distributions of the model parameters. In such a way we take into account the uncertainty in the evaluation of the conditional
intensity function characterizing the particular point process model. A further advantage of the Bayesian approach consists in providing, besides
an estimator, the whole posterior distribution of each quantity to estimate, a tool for quantifying the uncertainty of the estimator. The greater
flexibility of the modelling requires more powerful inferential methodologies; in our case as it is not possible to give the posterior distributions
explicitly, we have estimated them by generating sufficiently large samples from them through Markov chain Monte Carlo (MCMC) methods.
In Appendix A, we give a concise description of these methods referring to Gilks et al. (1995) for more information. The goodness of fit
of the estimated models is evaluated through the residual analysis and the models are compared each other on the basis of the Bayes factor.
Finally, we forecast the next step in the time evolution of the process, that is the date of the next earthquake. In fact the stress release models
give time-dependent hazard assessment conditionally on the entire seismic history of the region in exam. From this viewpoint forecasting
the next event means to simulate the time evolution of the process; we have done that through a stochastic simulation method, the timescale
transformation method, both using the plug-in estimate of the conditional intensity and the same intensity with parameter values sampled
from their posterior distributions. The forecasts, given in terms of mean and quantiles of the distribution of the date of the next event, are
compared with the really occurred events in the period 1993–2002 (Section 4).
C© 2007 The Authors, GJI, 169, 301–314
Journal compilation C© 2007 RAS
Bayesian inference of stress release models 303
2 S E I S M O G E N I C Z O N E S A N D C ATA L O G U E
One of the crucial factors affecting the performance of the stress release models is the identification of suitable seismogenic regions. Southern
Apennines are one of the portions of the Italian peninsula that experience the most frequent and destructive seismic activity; in the most
recent zonation ZS9 of the Italian territory (MPS04 Working Group 2004) this area corresponds to the zone 927, depicted in Fig. 1, which
coincides largely with the region, homogeneous from the viewpoint of the kinematic context and the expected rupture mechanism, formed
by the four zones 58, 62, 63 and 64 of the previous zonation ZS4 (Meletti et al. 2000). Changes to ZS4 have been done to make the zonation
compatible with the new catalogue CPTI04 and to acknowledge the innovative contribution of the database of the potential seismogenic
sources DISS (Valensise & Pantosti 2001b). This database is the result of a project developed by the scientists of the Istituto Nazionale di
Geofisica e Vulcanologia (INGV) starting from 1996; its goal is a fault segmentation model of Italy based on the assumption that seismicity
may be approximated by a finite number of potential seismogenic sources, surface projection of inferred faults which is likely to experience a
significant earthquake in the future. According to the recent research achievements in this project several large historical earthquakes, occurred
in the relatively narrow belt of seismicity characterizing the southern Apennines, are associated with inferred faults, either blind or hidden (see
Fig. 1 in Valensise & Pantosti 2001a). In particular for the earthquakes of 1688, 1732, 1857, respectively of magnitude M S = 6.72, 6.61, 6.96,
we refer to Valensise et al. (1993) and for the shock of 1805, M S = 6.57, to Cucci et al. (1996).
The average quality of buildings and the enormous cultural heritage make Italian cities especially vulnerable to earthquakes that would
not be as dangerous in other earthquake-prone countries; therefore we generally consider events with magnitude M S5+ as strong. In the new
catalogue CPTI04 the magnitude of several events has been re-evaluated and this process has led on average to a reduction of the magnitude
value by 0.2. Hence for homogeneity with previous studies we have used M S = 4.8 as threshold magnitude in the present analysis. The data set
is constituted by 35 events; the analysis of the completeness, performed through the identification of the change-point (Rotondi & Garavaglia
2002), indicates that the data set can be considered complete from the second half of the seventeenth century. Table 1 provides the summaries
of the posterior distributions of the change-point s, date from which the catalogue is considered complete, and of the rate h2 of the complete
part of the catalogue. The graphical representation of these results is given in Fig. 2.
We use the 24 data recorded from 1650 to 1992 to make inference, forecast the time of the next event and compare it with the real
occurrences in the interval 1993–2002. In the light of all available historical and geological evidence we have subdivided the zone 927 into
14˚E
14˚E
16˚E
16˚E
40˚N 40˚N
42˚N 42˚N
14˚E
14˚E
16˚E
16˚E
40˚N 40˚N
42˚N 42˚N
Figure 1. Seismogenic zone 927 of the Italian zonation ZS9 and epicentres of the shocks with magnitude M S ≥ 4.8 in the time interval 1650–2002. The
symbols mark the events belonging to the different subunits denoted by 1, 2, 3 and 4 in the text; circle = 1, inverse triangle = 2, square = 3, triangle = 4; star
indicates Calabrian-Lucano Apennines earthquake, 1998/9/9. The symbol size scales with magnitude.
C© 2007 The Authors, GJI, 169, 301–314
Journal compilation C© 2007 RAS
304 R. Rotondi and E. Varini
Table 1. Zone 927. Results on the completeness of the data set constituted by events of magnitude M S ≥ 4.8.
Summaries of the posterior distributions of the change point s and of the rate h2 of the part considered complete
of the catalogue.
Mean Mode Standard deviation
s 1665.00 1688.42 55.03h2 0.0751 0.070 0.0151
Quantiles q0.025 q0.05 q0.25 q0.5 q0.75 q0.95 q0.975
s 1446.2 1557.8 1666.3 1678.6 1685.1 1691.5 1697.6
h2 0.05 0.05 0.06 0.07 0.09 0.10 0.11
.95 HPD interval q l q h
h2 0.04 0.10
200 600 1000 1400 1800
0.0
0.0
02
0.0
04
0.0
06
0.0
08
po
ste
rio
r m
ea
n r
ate
cu
mu
lative
nu
mb
er
of
eve
nts
05
10
15
20
25
30
35
time
ma
gn
itu
de
100 300 500 700 900 1200 1500 1800
46
8
position of change–point
pro
ba
bili
ty d
en
sity
800 1000 1200 1400 1600 1800 2000
0.0
0.0
02
0.0
05
height h2
pro
ba
bili
ty d
en
sity
0.0 0.010 0.025 0.040 0.055
05
10
15
20
25
Figure 2. Zone 927—Graphic representation of the results on completeness. Above: posterior mean rate (solid line), cumulative number of events (dotted line),
and seismic history. Below left: estimated probability density of the position of the change point; below right: estimated probability density of the rate h2.
four subunits, each characterized by the presence of a fault segment which generated at least an earthquake with M S ≥ 6 in the past and
provided by a number of events at least equal to the number of model parameters (in our case n ≥ 3). Table 2 contains the data subdivided
by subunit; we are uncertain about the classification of the 1826 historical earthquake which could be assigned to either the subunit 3 or 4.
Its causative fault is unknown, but if we consider the transverse lineaments taken from the Synthetic Structural-Kinematic Map of Italy then
it should be separated from the fault segment generating the 1694 and 1980 strong (M S > 6) earthquakes. Hence our favourite allocation
is to the subunit 4; we have also examined the alternative classification and the results obtained are not significantly different as shown in
Section 4.
C© 2007 The Authors, GJI, 169, 301–314
Journal compilation C© 2007 RAS
Bayesian inference of stress release models 305
Table 2. From the catalogue CPTI04, list of the earthquakes with M S ≥ 4.8 occurred in the zone 927 of the zonation ZS9 in the interval 1650–1992. The
number in the right-hand column denotes the membership subzone starting from North to South; the correspondence with the symbols in Fig. 1 is: 1 = circle,
2 = inverse triangle, 3 = square and 4 = triangle.
Date Latitude Longitude M S Zone Date Latitude Longitude M S Zone
1688/06/05 41.280 14.570 6.72 1 1857/12/16 40.350 15.850 6.96 4
1694/09/08 40.880 15.350 6.87 3 1858/08/06 40.750 15.550 4.80 3
1702/03/14 41.120 14.980 6.32 2 1882/06/06 41.550 14.200 4.96 1
1708/01/26 39.922 16.126 5.46 4 1885/09/17 41.133 14.800 4.80 2
1732/11/29 41.080 15.050 6.61 2 1893/01/25 40.583 15.417 4.80 3
1794/06/12 41.000 15.000 4.80 2 1905/11/26 41.134 15.028 5.02 2
1805/07/26 41.500 14.470 6.57 1 1910/06/07 40.900 15.420 5.84 3
1807/11/11 40.297 15.845 4.80 4 1914/12/19 41.583 14.250 4.80 1
1826/02/01 40.520 15.730 5.56 3/4 1941/09/07 41.200 15.000 4.80 2
1831/01/02 40.082 15.785 5.23 4 1962/08/21 41.130 14.970 6.19 2
1836/11/20 40.150 15.780 5.79 4 1980/11/23 40.850 15.280 6.89 3
1853/04/09 40.820 15.220 5.89 3 1982/03/21 40.008 15.766 5.05 4
3 S T R E S S R E L E A S E M O D E L S : E S T I M AT I O N A N D S I M U L AT I O N
Let us consider the two data sets D = {ti , mi }i=1,...,n and U = {ti , mi , ri }i=1,...,n where, for each i = 1, . . . , n, t i = T i − T 0, T i is the date
and m i the magnitude of the ith event, T i ∈ (T 0, T f ) (in our case T 0 = 1650 and T f = 1992), m i > M 0 = 4.79, magnitude threshold, and
r i is the membership subunit of the ith event as defined in Section 2. Let us assume that at time t the stress present in the region is given
by X (t) = X (0) + ρt − S(t) where S(t) =∑
i :ti <t 10 0.75·(mi −M0) is the sum of the stress released, according to Benioff’s formula, by all
the events over the period (0, t), ρ > 0 is the constant unknown loading rate and X (0) is the initial stress level. Following the proposal of
Zheng & Vere-Jones (1994) we factor the conditional intensity function characterizing the model, into the product of a risk function ψ(X (t))
– the probability of occurrence in (t , t + dt), given the stress level at time t – and of the density function f (m) for the magnitude and we
obtain λ(t, m | Ht ) = ψ(X (t)) f (m | X (t)) where Ht is the history of the process up to t. In Rotondi & Varini (2003a) we have studied
the case of magnitude distributions depending on the history of the process up to time t; here, for the sake of simplicity, we assume that
f (m) is independent of the time and therefore it does not affect the estimate of the time component of the hazard. The risk function is to be
chosen among the increasing to infinity and convex functions; the forms used in the literature are the integer part ψ(X (t)) = [µ + νX (t)]+ in
Vere-Jones (1978), Rotondi & Varini (2003b), and the exponential ψ(X (t)) = exp{µ + νX (t)} in all the other works. The parameter ν can be
interpreted as the response of the region to the stress accumulated and is related to the strength and the heterogeneity of the crust (Bebbington
& Harte 2003). Here we have followed the exponential choice because, on the one hand, there are no physical reasons to discriminate between
the two functional forms and, on the other hand, the results obtained in the above-mentioned studies are not very sensitive to this option. We
examine two possibilities: (a) a single stress regime X (t) acts in the overall region and (b) different, independent stress levels X j (t), j =
1, . . . , 4, exist in each of the four subunits constituting the region and responding separately to the stress. The case (a) is modelled by a simple
SR model on the data set D:
λS(t |Ht ) = exp{α + ν (ρ t − S(t))} (1)
being α = µ + ν X (0) and the case (b) by an independent SR model on the data set U :
λI (t |H′t ) =
4∑
j=1
ψ j (X j (t)) =
4∑
j=1
exp{α j + ν j (ρ j t − S j (t))}, (2)
where α j = µ j + ν j X j (0) and S j (t) =∑n
i :ti <t,ri = j 10 0.75·(mi −M0), that is, only the events belonging to the jth subunit enter into the sum. A
variant of this model with interesting physical interpretations is obtained by assuming that the tectonic input is the same in all the subunits,
that is ρ j = ρ, j = 1, . . . , 4. This means that the subunits are subjected to the same tectonic regime, but their geological structure, and hence
their sensitivity, is different. To distinguish the conditional intensity function of this particular independent model we indicate it by λ Iρ . The
time component of the log-likelihood is given by:
log(L) =
n∑
i=1
log λ(ti |Hti ) −
∫ tn+1
t0
λ(t |Ht ) dt
in the simple model, whereas H′t substitutes Ht in the independent model and in its variant with common tectonic loading rate. If we set at
zero all the parameters but α’s in (1) and (2), then we obtain homogeneous Poisson processes with constant rate in the region and in each
subunit, respectively. In this case α represents the logarithm of the rate. We will use this formulation to make inference on the Poisson models
by means of the same procedure used for the SR models.
Inference on the parameters θ = (α, ν, ρ), {θ j = (α j , ν j , ρ j )}4j=1, and {θ j = (α j , ν j , ρ)}4
j=1 has been performed following the Bayesian
paradigm. First, to take into account the uncertainty on the value of the parameters we have assigned their prior distributions on the basis
of estimates found in the literature and of information drawn from previous studies (Rotondi & Varini 2003a). Table 3 summarizes the prior
distributions and their parameters. The assignments do not change in the different SR models; on the contrary, in the Poisson models the value
C© 2007 The Authors, GJI, 169, 301–314
Journal compilation C© 2007 RAS
306 R. Rotondi and E. Varini
Table 3. Prior distributions for the parameters of the stress release and Poisson models.
Model Parameter Prior Mean Variance
Stress release α Normal −5. 9.
(simple and independent) ν Gamma 0.1 0.01
ρ Gamma 0.5 0.25
Poisson (in the region) α Normal −3. 1.
Poisson (in each subunit) α Normal −4.5 1.
Table 4. Proposal distributions used in the Metropolis–Hastings algorithm.
Candidate value Proposal Mean Variance
of the chain
α(i) Normal α(i−1) 0.5
ν(i) Lognormal ν(i−1) 0.0001
ρ(i) Lognormal ρ(i−1) 0.2
of the α parameter decreases about by log 4 moving from the entire region to four subunits as α is approximately the logarithm of the expected
occurrence rate and the rate of each subunit is taken equal to the fourth part of the regional rate.
In principle, the posterior distributions are obtained through Bayes’ theorem, but in our case it is impossible to evaluate analytically the
required integrals; therefore, we resort to Markov chain Monte Carlo techniques, in particular to the Metropolis–Hastings algorithm, to get
samples from the posterior distributions of the parameters. The procedure mainly consists in designing for each parameter a Markov chain
whose long-time equilibrium is the corresponding posterior distribution. At each iteration a new candidate element of the chain is drawn
from a suitable distribution, called proposal; with a certain probability it is accepted and the chain moves, otherwise it is rejected and the
chain remains at the current value. The ergodic averages of suitable functions of the generated sample converge to the desired summaries of
the posterior distribution like mean, variance, and quantiles. Moreover, kernel smoothing techniques applied to the sampled chains provide
estimates of the posterior densities. In Table 4, we have listed the proposals used in the Metropolis–Hastings algorithm; the indices mean that,
for instance, for the parameter α the candidate value α(i) at the ith iteration is generated from a normal distribution centred in the previous
element of the chain, α(i−1), and of variance 2. The main ideas on the MCMC methods are outlined in Appendix A; for more details we refer
to Gilks et al. (1995) and references therein. After evaluating the posterior means of the parameters we have substituted them in the likelihood
of the different models and have obtained their plug-in estimate which has allowed us to compare the fitting of the models to the data set.
The goodness of fit of the different models can also be evaluated through another statistical tool: the residual analysis (Ogata 1988). The
1–1 transformation of the time from t to τ = �(t), being � the cumulative intensity function of any of the considered models, changes the
occurrence dates t 1, t 2, . . . , t n into τ 1, τ 2, . . . , τ n , which are distributed as the standard stationary Poisson process if λ(·) is the intensity
function of the process really generating the data. Standard tests for stationarity, independence, and exponentially distributed interevent times
can be used to determine whether there are systematic deviations of the data from the fitted model.
3.1 Comparison of models through the Bayes factor
Consistently with the Bayesian approach we are following we adopt the Bayes factor as quantitative measure of the evidence in favour of a
model (Kass & Raftery 1995). Given the models M1,M2, and the data set D, the Bayes factor is the ratio of the posterior odds of M1 to its
prior odds, that is:
B12 =pr (D | M1)
pr (D | M2)=
pr (M1 |D)
pr (M2 |D)÷
pr (M1)
pr (M2). (3)
When the prior probabilities of the two competing hypotheses are equal, the Bayes factor coincides with the posterior odds. The densities
pr (D | Mk), k = 1, 2, are obtained by integrating over the parameter space with respect to their prior distributions
pr (D | Mk) =
∫
pr (D | θk,Mk) π (θk |Mk) dθk, (4)
where π (θk |Mk) is the prior density of the parameter θ k under Mk , and pr (D | θk,Mk) is the likelihood function of θ k . The quantity
pr (D | Mk) is a marginal (or integrated) likelihood; it is also referred to as the evidence for Mk . Integration with respect to the prior
distributions allows to take into account the different number of parameters in the models, besides the uncertainty on their value, as is done
in the frequentist statistics by model selection criteria like AIC. To clarify how the evidence penalizes highly complex models we follow the
interpretation given by Bishop (1995) of the expression (4). Let us consider a single parameter θ . The integrand of (4) is proportional to the
posterior density of θ ; if it is sharply peaked around the most probable value, the mode θ MP, then we can approximate the integral (4) by
pr (D | Mk) ≈ pr (D | θM P ,Mk) π (θM P |Mk) � θposterior, (5)
C© 2007 The Authors, GJI, 169, 301–314
Journal compilation C© 2007 RAS
Bayesian inference of stress release models 307
where � θ posterior is the width of the peak. If we take the prior π (θ ) to be uniform over some large interval � θ prior and we approximate its
value by 1� θprior
, then (5) becomes
pr (D | Mk) ≈ pr (D | θM P ,Mk)
(
� θposterior
� θprior
)
.
The expression on the right-hand side is the product of the likelihood evaluated for the most probable parameter value and of the ratio, called
Occam factor, with value <1. For a model with many parameters, each will generate a similar Occam factor which reduces the evidence.
Hence the best model according the Bayes factor comes out in the balance between fitting the data well (large likelihood) and not being too
complex (relatively large Occam factor).
The computational difficulties concerning the evaluation of the multidimensional integral in (4) can be solved by numerical methods or
better by Monte Carlo methods. The simple Monte Carlo integration is based on the sampling from the priors π (θk |Mk); its precision can
be improved by the importance sampling which consists in the generating a sample {θ(i)k ; i = 1, . . . , M}, from another density π∗(θk |Mk),
called importance sampling function, and in approximating the integral (4) by∑M
i wi pr(
D | θ(i)k ,Mk
)
∑M
i wi
, (6)
where wi = π (θ(i)k |Mk)/π∗(θ
(i)k |Mk). In our case it is particularly convenient to exploit the sequences already generated by the Metropolis–
Hastings algorithm for estimation, and to adopt the posterior density as importance function. Substituting in (6) yields the harmonic mean of
the likelihood values as an estimate for pr (D | Mk)
pr (D | Mk) =
{
1
M
M∑
i
pr(
D | θ(i)k ,Mk
)−1
}−1
. (7)
This converges almost surely to the correct value, pr (D |Mk), as M → ∞, but it may have infinite variance because of the occasional
occurrence of a value of θ(i)k with a small likelihood and hence a large effect on the final result (Raftery et al. 2006). If the values θ
(i)k were
drawn from the prior density the estimator (7) would satisfy the Gaussian central limit; then a solution could consist in using as importance
sampling function a mixture of the prior and posterior densities. To avoid doing a double sampling we add to the M samples from the posterior
a further δM/(1 − δ) values of θ drawn from the prior and associate the expected value of the likelihood pr (D |Mk) to each of them (Newton
& Raftery 1994). This yields to approximate the evidence pr (D | Mk) by
pr (D | Mk) =δ M/(1 − δ) +
∑M
i=1 pr(
D | θ(i)k ,Mk
)
/{
δ pr (D | Mk) + (1 − δ) pr(
D | θ(i)k ,Mk
)}
δM/(1 − δ) pr (D | Mk)−1 +∑M
i=1
{
δ pr (D | Mk) + (1 − δ) pr(
D | θ(i)k ,Mk
)}−1.
The estimator pr (D | Mk) is evaluated by using a simple iterative scheme which generally converges fast. Following the hints given by
Newton & Raftery (1994) we adopted δ as small as 0.01 and we did not note any of the instability of pr (D | Mk) (7).
Among the properties of the Bayes factor we remind that it does not require alternative models to be nested (Kass & Raftery 1995);
moreover the Bayes factor can be used to calculate the posterior probabilities of competing models. When one thinks that none of the considered
models is correct, any approach that selects one of them and then makes inference conditionally on that model ignores the uncertainty involved
in model selection. This difficulty can be avoided if one calculates composite estimation and prediction by using the posterior probabilities of
the models provided by the corresponding Bayes factors (Hoeting et al. 1999; Wasserman 2000).
3.2 Forecasting the time-to-the-next event
We have then tackled the problem of forecasting the date of the next event by simulating the distribution of the waiting time. The thinning
method (Ogata 1981; Bebbington & Harte 2003), provides a general solution to the problem of the simulation of a point process. In our
case it is also possible to apply an alternative and more efficient simulation method, the timescale transformation method. It is particularly
appropriate when it is easy to invert the cumulative intensity function �(t | ·). The algorithm is composed by the following steps:
(i) generate u ∼Unif (0, 1);
(ii) the time-to-next-event is given by t ′ = 1νρ
log{1 − 1A
log(1 − u)}, where
A =1
νρexp{α + ν (ρ T − S(T ))}.
Details of these and other simulation procedures are given in Wang et al. (1991). As regards the independent SR models we simulate
the distribution of the time t ′j to the next earthquake in each subunit by applying the above-mentioned procedure to each ψ j (·) separately;
then the minimum t ′ = min j=1,...,4t ′j provides the time-to-next-event in the overall region. In this way we have estimated the distribution of
the time of the next event in each subunit as well as in the global region. Let us suppose to repeat the simulation M times. The frequency
π j = 1/M∑M
i I (t ′i = t ′
i j ) of the sweeps in which the simulated next event comes from the jth subunit provides a rough estimate of the
probability that the next earthquake will occur in that subunit.
C© 2007 The Authors, GJI, 169, 301–314
Journal compilation C© 2007 RAS
308 R. Rotondi and E. Varini
Lu & Vere-Jones (2001) affirm that ‘additional uncertainty arises from the fact that the parameters are fitted from limited data runs
and so are not known exactly. To capture this additional uncertainty, each simulation should start by first selecting parameters values from a
posterior distribution for such values. This will somewhat increase the spread of the simulations and so decrease the predictability . . . .’ The
Bayesian setting is the most suitable to deal with the uncertainty about the parameters, because estimation procedure provides the probability
distribution of each parameter besides its estimate. In the simulation of the time-to-next-event we have applied the timescale transformation
method first to the plug-in estimate of the conditional intensities λS(· |HT , θ) (1) and λI (· |HT , θ), λIρ(· |HT , θ) (2) and then to the same
intensities but evaluated in each parameter vector of a sample of size M drawn from the posterior distribution.
4 R E S U LT S
Summarizing the models examined are: Poisson (MP ) and simple SR (MS) on the entire region, Poisson (MP4 ), independent SR with
(MIρ) and without (MI ) common tectonic loading rate in the 4 subunits. For each of them posterior mean and standard deviation of
the parameters have been estimated using the ergodic averages of suitable functions of the Markov chains produced by the Metropolis–
Hastings algorithm; these estimates are reported in Table 5. The MCMC sampling scheme followed in estimation consists in generating
a chain of 107 elements for each parameter, recording the output every 250th iteration to reduce the autocorrelation within the chain and
discarding the initial 20 per cent of scans to avoid starting biases. The plug-in estimates of the log-likelihood functions are reported in
Table 5 as well; in particular, for the independent SR models the log-likelihood related to each subunit is given in the last column. Supposing
that the prior probabilities of the models are equal, the Bayes factor (3), expressed in Log10, of the Poisson and simple SR models is
Log10 pr (D | MS) − Log10 pr (D | MP ) = −38.75 + 38.56 = −0.19, that is a slight evidence in favour of MP according to Jeffreys’ scale
recalled in Table 6.
Let us examine what happens when adding geophysical information, that is, subdividing the region in seismogenic sources. We point
out that the models MP and MS are applied to the smaller data set D ⊂ U ; in order to compare the values of their likelihood functions with
those of the other models we have to add the membership probability of each event to a subunit; these probabilities are roughly estimated
through the frequencies n j/n, where n j , j = 1, . . . , 4, is the number of events in the jth subunit, that is, the number of r i = j , i = 1, . . . , n.
With this addition the log-likelihood of the model MP becomes −120.58 versus −120.76 of the model MP4 showing a negligible gain. On
the contrary the log-likelihood of the simple model MS becomes −120.10 versus −113.20 for the independent SR model MI and −110.41
Table 5. Posterior means, standard deviations (in parentheses) of the parameters, and plug-in estimate of log-likelihood for the Poisson, simple
and independent SR models with and without common loading rate. The 1826 earthquake is included into zone 4.
Model α (s.d.) ν (s.d.) ρ (s.d.) loge L(θ )
Poisson −2.69 (0.20) −87.85
simple −2.79 (0.39) 0.0078 (0.0069) 0.79 (0.41) −87.37
Model Subunit α (s.d.) ν (s.d.) ρ (s.d.) loge L(θ )
Poisson 1 −4.54 (0.46) − 21.82
2 −4.03 (0.37) −34.31
3 −4.18 (0.39) −30.32
4 −4.03 (0.37) −34.31
−120.76
Independent (ρ j = ρ) 1 −4.61 (0.64) 0.0585 (0.0436) 0.20 (0.03) −21.19
2 −4.39 (0.59) 0.0678 (0.0441) “ −32.66
3 −4.40 (0.73) 0.1037 (0.0584) ” −27.19
4 −5.84 (1.20) 0.1047 (0.0427) “ −29.36
−110.40
Independent 1 −4.44 (0.85) 0.0441 (0.0397) 0.22 (0.21) −21.37
2 −4.22 (0.75) 0.0542 (0.0454) 0.24 (0.21) −33.89
3 −4.57 (0.92) 0.0781 (0.0576) 0.26 (0.17) −29.23
4 −6.46 (1.52) 0.0994 (0.0431) 0.25 (0.08) −28.72
−113.20
Table 6. Scale suggested by Jeffreys (1961) for interpreting the Bayes factor.
. . . evidence against M1 B12 Log10 B 12
∣
∣
∣. . . evidence in favour of M1 B12 Log10 B 12
Slight . . . (0.31623, 1) (−0.5, 0)∣
∣
∣Slight . . . (1, 3.1623) (0, 0.5)
Moderate . . . (0.1, 0.31623) (−1, −0.5)∣
∣
∣Moderate . . . (3.1623, 10) (0.5, 1)
Strong . . . (0.01, 0.1) (−2, −1)∣
∣
∣Strong . . . (10, 100) (1, 2)
Decisive . . . (0, 0.01) (−∞, −2)∣
∣
∣Decisive . . . (100, ∞) (2, ∞)
C© 2007 The Authors, GJI, 169, 301–314
Journal compilation C© 2007 RAS
Bayesian inference of stress release models 309
Table 7. Marginal likelihood and pairwise Bayes factors of the models considered.
MP MS MP4 MI MIρ
Log10 pr (D | M)∣
∣
∣−52.78 −52.96 −53.71 −51.26 −50.59
∣
∣
∣M1
Log10(B1,2)∣
∣
∣MP MS MP4 MI MIρ
MP
∣
∣
∣0 −0.19 −0.93 1.52 2.19
MS
∣
∣
∣0.19 0 −0.75 1.70 2.37
M2 MP4
∣
∣
∣0.93 0.75 0 2.45 3.12
MI
∣
∣
∣−1.52 −1.70 −2.45 0 0.67
MIρ | −2.19 −2.37 −3.12 −0.67 0
Table 8. Posterior means, standard deviations (in parentheses) of the parameters, and plug-in estimate of log-likelihood for the Poisson,
simple and independent SR models with and without common loading rate. The 1826 earthquake is included into zone 3.
Model Subunit α (s.d.) ν (s.d.) ρ (s.d.) loge L(θ )
Poisson 1 −4.54 (0.46) − 21.82
2 −4.03 (0.37) −34.31
3 −4.03 (0.37) −34.31
4 −4.18 (0.39) −30.32
−120.76
Independent (ρ j = ρ) 1 −4.61 (0.64) 0.0585 (0.0436) 0.20 (0.03) −21.19
2 −4.39 (0.59) 0.0678 (0.0441) ” −32.67
3 −3.99 (0.62) 0.0867 (0.0488) ” −31.67
4 −5.99 (1.20) 0.0989 (0.0432) ” −26.23
−111.76
Independent 1 −4.44 (0.85) 0.0441 (0.0397) 0.22 (0.21) −21.37
2 −4.22 (0.75) 0.0542 (0.0454) 0.24 (0.21) −33.89
3 −4.33 (0.83) 0.0633 (0.0472) 0.28 (0.19) −33.25
4 −6.19 (1.46) 0.0880 (0.0442) 0.24 (0.10) −26.02
−114.53
for the independent model MIρ with common loading rate. In this case the increase in the likelihood justifies the subdivision in zones. The
marginal likelihood and the pairwise Bayes factors of the five models are summarized in Table 7; the highest values correspond to the Bayes
factors of the independent SR models against the Poisson model MP4 :
Log10(BMIρ ,MP4
) = 3.12 Log10(BMI ,MP4
) = 2.45.
In both the cases there is decisive evidence against the Poisson MP4 model; moreover we note that the data provide significant evidence for
the model MIρ against any other.
The same analysis could be repeated for the results in Table 8 corresponding to the case in which the 1826 earthquake is assigned to
the subunit 3; the conclusions are clearly the same. We wish to note that the residual analysis has not come out to be a discriminating factor
among the different models; in fact all the models have passed the Kolmogorov–Smirnov test for the uniform distribution of the occurrence
transformed times; the p-values obtained for the different models are given in Table 9.
Fig. 3 represents the plug-in estimate of the conditional intensity of the simple (a) and of the independent (b) SR models. Being the
addends ψ j (X j ) of (2) independent, the conditional intensity function of the independent models can be decomposed into the sum of the
conditional intensities related to each subunit; these are represented in Fig. 4(a) for the independent model with different loading rates and
in Fig. 4(b) for the independent model with ρ j = ρ. In Fig. 5 we gather most of the information we obtain from the iterative estimation
procedure on the conditional intensity function. For the best model according to the Bayes factor, MIρ , Fig. 5 depicts the plug-in estimate
and the ergodic mean of λIρ ; moreover, if we consider the set of values λIρ (t | θ(i)), i = 1, . . . , 32 000, where θ(i)’s are drawn from the chain
generated by the inferential MCMC method, then we can compute, at each t, the median and the extremes of the shortest interval including
90 per cent of those values.
The simulation procedure performed for forecasting was repeated M = 5 × 105 times. The histograms of the times-to-the-next event
simulated through the plug-in estimates λS(θ), λI (θ) and λIρ(θ) are represented in Figs 6(a–c), respectively. Fig. 7 depicts the histogram of
Table 9. p-values of the Kolmogorov–Smirnov test for the uniform distribution of
the transformed times of occurrence.
Model MP MS MP4 MI MIρ
p-value 0.15 0.31 0.21 0.26 0.08
C© 2007 The Authors, GJI, 169, 301–314
Journal compilation C© 2007 RAS
310 R. Rotondi and E. Varini
psi(t)
0.0
20.0
40.0
60.0
80.1
0
PS
time (in years)
magnitude
1700 1800 1900 2000
45
67
8
(a)
0.0
0.1
0.2
0.3
P4II_rho
time (in years)
1700 1800 1900 2000
45
67
8
(b)
Figure 3. Estimated conditional intensity function of the Poisson (dotted line) and simple (solid line) models (a); and of the Poisson (dotted line), independent
SR (solid line), independent SR with common loading rate (dashed line) models (b); each vertical dotted line corresponds to an event and is proportional to its
magnitude.
time (in years)
psi(t)
1700 1800 1900 2000
0.0
0.0
50.1
00.1
5
sub–unit 1
sub–unit 2
sub–unit 3
sub–unit 4
(a)
time (in years)
1700 1800 1900 2000
sub–unit 1
sub–unit 2
sub–unit 3
sub–unit 4
(b)
Figure 4. Estimated conditional intensity function of each subunit in the independent SR model with different loading rate (a) and with common rate (b) (the
1826 earthquake is included into the subunit 4).
time (in years)
lam
bd
a(t
)
1700 1800 1900 2000
0.0
0.0
50
.10
0.1
50
.20
0.2
5
50% quantileergodic meanplugin90% shortest interval
Figure 5. Plug-in estimate (short dashed line) and ergodic mean (dotted line) of the conditional intensity of the independent SR model with common loading
rate. At each t, varying the parameters simulated by the MCMC method, a set of values λ(t | θ(i)), i = 1, . . . , 32 000, is obtained; the median value and the
shortest interval including 90 per cent of the values of this set are represented by the solid and the dashed lines, respectively (the 1826 earthquake is included
into the subunit 4).
C© 2007 The Authors, GJI, 169, 301–314
Journal compilation C© 2007 RAS
Bayesian inference of stress release models 311
2000 20300.0
0.0
50.1
00.1
5time
frequency
(a)
Simple
2000 2030
0.0
0.0
50.1
00.1
5
time
frequency
(b)
Independent
2000 2030
0.0
0.0
50.1
00.1
5
time
frequency
(c)
Independent
common rate
Figure 6. Histograms of the simulated distribution of the time of the next event for the simple (a) and independent SR model with different loading rate (b)
and independent SR model with common loading rate (c).
2000 2050 2100 2150
0.0
0.0
20.0
6
time
frequency
sub–unit 1
2000 2050 2100 2150
0.0
0.0
20.0
6
time
frequency
sub–unit 2
2000 2050 2100 2150
0.0
0.0
04
0.0
10
time
frequency sub–unit 3
2000 2050 2100 2150
0.0
0.0
10
0.0
20
time
frequency
sub–unit 4
Figure 7. Histograms of the simulated distributions of the time of the next event in each subunit obtained by applying the timescale transformation method to
each ψ j (·) of the independent MI model, separately.
the values sampled through each ψ j (·) of the model MI for every subunit. As suggested by Lu & Vere-Jones (2001) we have simulated by
using, in the conditional intensity, both the estimates of the parameters and values drawn from the posterior distributions. Summaries of the
samples obtained in the two ways are given in Tables 10 and 11, respectively: the mean, the standard deviation, the (1 − α) quantiles, for α =
0.1, 0.5, 0.9, and, in the case of the independent SR model, also the probabilities that the next event in the region occurs in a specific subunit.
We have also evaluated the marginal likelihood functions over the time interval 1993–2002 based on the information coming from the
calibration period 1650–1992, that is, by integrating (4) with respect to the posterior densities. This is a way to test the ability of the models
to predict the future data (Forster 2000). The results, in terms of log-likelihood, are: −4.6785 for the Poisson MP4 model, −4.5286 for the
independent MI and −5.2408 for the independent MIρ models; hence in the extrapolation the assumption that the tectonic loading rate
is different for each subunit provides the best fitting. However we have to recognize that the conclusion cannot be significant as ten years,
the length of the available period for this test, are few. All the same, we think this example is of avail in highlighting the potentiality of the
Bayesian methodologies in predictive context.
C© 2007 The Authors, GJI, 169, 301–314
Journal compilation C© 2007 RAS
312 R. Rotondi and E. Varini
Table 10. Estimated quantiles and mean of the simulated distribution of the time of the first event after 1992 obtained by using the plug-in estimate of the
log-likelihood for the considered models; for the independent SR models π j , j , . . . , 4, are the estimates of probabilities that the next event occurs in the subunit
j.
Model Quantiles tnext (s.d.) π 1 π 2 π 3 π 4
10 per cent 50 per cent 90 per cent
Poisson 1994.52 2003.01 2026.25 2007.44 (14.44)
Simple 1994.48 2002.51 2022.67 2006.07 (12.18)
Poisson (4 subunits) 1994.59 2003.47 2027.78 2008.11 (15.11) 0.18 0.28 0.25 0.29
Independent (ρ j = ρ) 1994.32 2001.25 2017.31 2003.89 (9.61) 0.35 0.50 0.03 0.12
Independent 1993.71 1997.56 2007.16 1999.24 (5.76) 0.24 0.51 0.08 0.17
Table 11. Analogously to Table 10, summaries of the simulated distributions of the time of the first event after 1992 obtained by sampling from the posterior
distribution of each parameter.
Model Quantiles tnext (s.d.) π 1 π 2 π 3 π 4
10 per cent 50 per cent 90 per cent
Poisson 1994.52 2003.16 2027.90 2008.06 (15.73)
Simple 1994.33 2001.82 2023.57 2006.25 (14.25)
Poisson (4 subunits) 1994.59 2003.57 2029.28 2008.66 (16.27) 0.18 0.28 0.25 0.29
Independent (ρ j = ρ) 1993.70 1999.74 2021.01 2004.41 (13.75) 0.34 0.45 0.05 0.16
Independent 1994.11 2000.39 2018.12 2003.89 (11.30) 0.23 0.35 0.09 0.33
5 C O N C L U S I O N S
In this paper, we have shown how a more careful appraisal of the data set can improve widely the performance of a stochastic model; in fact we
have exploited information on the seismogenic sources to split up the region under exam and to fit to the enriched data set different versions of
the stress release model. We can conclude that in our case a substantially better fit is obtained by treating the parts of the region as independent
units than by treating them as parts of a common unit. Moreover, between the two possibilities, common or different tectonic loading rate,
the first has given the best result in terms of Bayes factor. These results match those provided in the literature by Zheng & Vere-Jones (1991,
1994) where large-scale geographical regions of China, Iran and Japan give better fits to the stress release models when broken down into
subunits. The class of stress release models seems capable of considerably further ramification; future researches could concern possible fault
interactions; the model which should be used for this aim is the linked SR model (Bebbington & Harte 2003).
The forecast of the time of the next event is better when one uses the independent SR models as well; in fact in Table 10 the time predicted
by these models is 2003.89 or 1999.24 when the loading rate is common or different, respectively, versus 2006.07 of the simple model. We
remind that the date of the unique earthquake with M S ≥ 4.8 really occurred after 1992 is 1998.69; it hit the subunit 4 (see Fig. 1). Again,
considering the quantiles, the probability that the next event occurs within the interval (1993–1999) is surely larger for the independent SR
models than for the simple. As regards the probabilities π j , j = 1, . . . , 4, we have to resort to the simulation based on the parameter values
sampled from the posterior distributions at every iteration (see Table 11) to obtain, in agreement with the real occurrence of the last earthquake
in the subunit 4, that π4 is very similar to π2 = max j=1,...,4 π j , for the models MP4 ,MI .
In conclusion the winning combination seems that between stress release model and partition of the region in subunits related to
seismogenic sources; while it is clear that the model MIρ fits better the observations in the backward analysis, the model MI seems to be
preferred in the predictive analysis, but the number of occurrences and the length of the time window make not conclusive this latter result.
A C K N O W L E D G M E N T S
The authors wish to express their sincere thanks to the two anonymous reviewers for their precise and thoughtful comments. The map was
produced by using the GMT software (Wessel & Smith 1998). This work was funded by the Italian Dipartimento della Protezione Civile in
the frame of the 2004–2006 Agreement with Istituto Nazionale di Geofisica e Vulcanologia - INGV.
R E F E R E N C E S
Bebbington, M. & Harte, D.S., 2003. The linked stress release model for
spatio-temporal seismicity: formulations, procedures and applications,
Geophys. J. Int., 154(3), 925–946.
Bishop, C.M., 1995. Neural Networks for Pattern Recognition, Clarendon
Press, Oxford
Boschi, E., Guidoboni, E., Ferrari, G., Valensise, G. & Gasperini, P., 1997.
Catalogo dei Forti Terremoti in Italia dal 461 a.C. al 1990, ING and SGA,
Bologna, pp. 644.
Camassi, R. & Stucchi, M., 1997. NT4.1.1 un catalogo paramet-
rico di terremoti di area italiana al di sopra del danno, Gruppo
Nazionale per la Difesa dai Terremoti, Milano, 95 pp., available from
http://emidius.itim.mi.cnr.it/NT
Cinti, F.R., Faenza, L., Marzocchi, W. & Montone, P., 2004. Probability map
of the next M ≥ 5.5 earthquakes in Italy, Geochem. Geophys. Geosyst.,
5, Q11003. doi:10.1029/2004GC000724.
CPTI Working Group, 2004. Catalogo Parametrico dei Terremoti Ital-
iani - Version 2004 (CPTI04). INGV, Bologna (I); available on
http://emidius.mi.ingv.it/CPTI/
C© 2007 The Authors, GJI, 169, 301–314
Journal compilation C© 2007 RAS
Bayesian inference of stress release models 313
Cucci, L., D’Addezio, G., Valensise, G. & Burrato, P., 1996. Investigating
seismogenic faults in central and southern Apennines (Italy): modelling
of fault related landscape features, Ann. Geophys., 94, 603–618.
Faenza, L., Marzocchi, W. & Boschi, E., 2003. A non-parametric haz-
ard model to characterize the spatio-temporal occurrence of large earth-
quakes; an application to the Italian catalogue, Geophys. J. Int., 155, 521–
531.
Faenza, L., Marzocchi, W., Lombardi, A.M. & Console, R., 2004. Some in-
sights into the time clustering of large earthquakes in Italy, Ann. Geophys.,
47(5), 1635–1640.
Forster, M.R., 2000. Key concepts in model selection: performance and gen-
eralizability, J. Math. Psychol., 44, 205–231.
Gilks, W.R., Richardson, S. & Spiegelhalter, D.J., eds. 1995. Markov chain
Monte Carlo in Practice, Chapman & Hall, London.
Hawkes, A.G., 1971. Point spectra of some mutually exciting point processes,
J. Roy. Statist. Soc., B33, 438–443.
Hawkes, A.G. & Oakes, D.A., 1974. A cluster process representation of
self-exciting process, J. Appl. Probab., 11, 493–503.
Hoeting, J.A., Madigan, D., Raftery, A.E. & Volinsky, C.T., 1999. Bayesian
model averaging: a tutorial, Statist. Sci., 14(4), 382–417.
Jeffreys, H., 1961. Theory of Probability, Oxford, Clarendon Press.
Kagan, Y.Y., 1991. Likelihood analysis of erthquake catalogue, Geophys. J.
Int., 106, 135–148.
Kagan, Y.Y., 1997. Are earthquakes predictable?, Geophys. J. Int., 131, 505–
525.
Kagan, Y.Y. & Jackson, D.D., 2000. Probabilistic forecasting of earthquakes,
Geophys. J. Int., 143, 438–453.
Kagan, Y.Y., Rong, Y.F. & Jackson, D.D., 2003. Probabilistic forecasting of
seismicity, Chapter 5.2 in Earthquake Science and Seismic Risk Reduc-
tion, pp. 185–200, eds Mulargia, F. & Geller, R.J., Kluwer, Dordrecht.
Kass, R.E. & Raftery, A.E., 1995. Bayes factors, J. Amer. Statist. Ass., 90,
430, 773–795.
Lu, C. & Vere-Jones, D., 2001. Statistical analysis of synthetic earthquake
catalogs generated by models with various levels of faults zone disorder,
J. geophys. Res., 106(B6), 11 115–11 125.
Matsu’ura, R.S., 1986. Precursory quiescence and recovery of aftershock
activities before some large aftershocks, Bull. Earthquake Res. Instit.,
University of Tokyo, 61, 1–65.
Meletti, C., Patacca, E. & Scandone, P., 2000. Construction of a seismotec-
tonic model: the case of Italy, Pure. appl. Geophys., 157, 11–35.
MPS04 Working Group, 2004. Redazione della mappa di pericolosita sis-
mica prevista dall’Ordinanza PCM 20.03.03 n. 3274 - App. 2. INGV,
Milano-Roma (I), pp. 38.
Newton, M.A. & Raftery, A.E., 1994. Approximate Bayesian inference with
the weighted likelihood bootstrap, J.R. Statist. Soc., B, 56(1), 3–48.
Nishenko, S.P., 1991. Circum-Pacific seismic potential: 1989–1999, Pure.
appl. Geophys., 135, 169–259.
Ogata, Y., 1981. On Lewis’ simulation method for point processes, IEEE
Trans. Inform. Theory IT-30, 23–31.
Ogata, Y., 1988. Statistical models for earthquake occurrences and residual
analysis for point processes, J. Amer. Statist. Assoc., 83, 401, 9–27.
Ogata, Y., 1997. Detection of precursory relative quiescence before great
earthquakes through a statistical model, J. geophys. Res., 97, 19 845–
19 871.
Ogata, Y., 1999. Seismicity analysis through point-process modeling: a re-
view, In Seismicity Patterns, Their Statistical Significance and Physical
Meaning (eds. M. Wyss, K. Shimazaki and A. Ito) (Birkhauser, Basel),
Pure. appl. Geophys., 155, 471–507.
Raftery, A.E., Newton, M.A., Satagopan, J.M. & Krivitsky, P.N., 2006. Es-
timating the integrated likelihood via posterior simulation using the har-
monic mean identity, Proc. Valencia/ISBA 8th World Meeting on Bayesian
Statistics, Benidorm, (Alicante, Spain), June 1–6, 2006.
Rotondi, R. & Garavaglia, E., 2002. Statistical analysis of the completeness
of a seismic catalogue, Natural Hazards, 25(3), 245–258.
Rotondi, R. & Varini, E., 2003a. Bayesian analysis of a marked point pro-
cess: application in seismic hazard assessment, Stat. Methods Appl., 12,
79–92.
Rotondi, R. & Varini, E., 2003b. Bayesian analysis of linked stress release
models, Proceedings of the meeting on ‘Complex models and computer-
intensive methods for estimation and prediction’, Treviso (I), 4–6
September 2003, 356–361.
Valensise, G. & Pantosti, D., 2001a. The investigation of potential earthquake
sources in peninsular Italy: a review. J. Seism., 5, 287–306.
Valensise, G. & Pantosti, D. (eds.), 2001b. Database of Potential Sources
for Earthquakes Larger than M 5.5 in Italy. Ann. Geophys., 44, 4, Suppl.,
797–964, with CD-ROM - version DISS 2.0, version DISS3.0 available
on http://www.ingv.it/%7ewwwpaleo/DISS3/
Valensise, G., Pantosti, D., D’Addezio, G., Cinti, F.R. & Cucci, L.,
1993. L’identificazione e la caratterizzazione di faglie sismogenetiche
nell’Appennino centro-meridionale e nell’arco calabro: nuovi risultati e
ipotesi interpretative, Proc. XII Meeting Gruppo Nazionale di Geofisica
della Terra Solida, Rome, 331–342.
Vere-Jones, D., 1978. Earthquake prediction: a statistician’s view, J. Phys.
Earth, 26, 129–146.
Vere-Jones, D. & Yonglu, D., 1988. A point process analysis of histori-
cal earthquakes from North China. Earthquake Res. China, 2(2), 165–
181.
Wang, A.L., Vere-Jones, D. & Zheng, X., 1991. Simulation and estimation
procedures for stress release models, in Stochastic Processes and Their
Applications, Vol. 370, pp. 11–27, eds Beckmann, M.J., Gopalan, M.N. &
Subramanian, R., Springer Lecture Notes in Economics and Mathematical
Systems.
Wasserman, L., 2000. Bayesian model selection and model averaging, J.
Math. Psychology, 44, 92–107.
Wessel, P. & Smith, W.H.F., 1998. New, improved version of Generic Map-
ping Tools released, EOS Trans. Am. Geophys. U., 79 (47), pp. 579.
Zheng, X. & Vere-Jones, D., 1991. Application of stress release models to
historical earthquakes from North China. Pure. appl. Geophys., 135, 4,
559–576.
Zheng, X. & Vere-Jones, D., 1994. Further applications of the stochastic
stress release model to historical earthquake data, Tectonophysics, 229,
101–121.
A P P E N D I X A : M A R K O V C H A I N M O N T E C A R L O M E T H O D S
All relevant information on the unknown parameters θ given the observed data y can be inferred from the posterior distribution p(θ | y) by
evaluating integrals of the form:
J =
∫
f (θ) p(θ | y) dθ
of some function f (θ) with respect to the posterior distribution p(θ | y). For example, point estimates for unknown parameters are given by
the posterior means, i.e. f (θ) = θ, or measures of the uncertainty by the variances, i.e. f (θ) = (θ − E(θ))2. When it is not possible to evaluate
analytically these integrals, we can resort to the simulation of a sample if we can construct a Markov chain whose equilibrium distribution is
p(θ | y). Under suitable regularity conditions, if θ1,θ2, . . . , θt , . . . is a realization from an appropriate chain, the following results hold
θt → θ ∼ p(θ | y), in distribution
C© 2007 The Authors, GJI, 169, 301–314
Journal compilation C© 2007 RAS
314 R. Rotondi and E. Varini
and
J =1
t
t∑
i=1
g(θi ) → Eθ|y{g(θ)} almost surely.
The most used algorithms to set up the chains are the Gibbs sampling and the Metropolis–Hastings. For a complete review of this topic we
refer to Gilks et al. (1995). The Metropolis–Hastings algorithm constructs a Markov chain converging to p(θ | y) by defining the transition
probability from θt = θ to the next realised state θt+1 as follows. Let q(θ, θ′) denote a transition probability function such that, if θt = θ, the
vector θ′j drawn from q (θ, θ′) is considered as a proposed possible value for θt+1. It is accepted with probability
α(θ,θ′) = min
{
p(θ′ | y) q(θ′,θ)
p(θ | y) q(θ,θ′)1
}
; (B1)
otherwise it is rejected and we set θt+1 = θ. In this way the transition probabilities of the Markov chain are given by
p(θ,θ′) =
{
q(θ,θ′) α(θ,θ′) if θ′ �= θ
1 −∑
θ′′ q(θ,θ′′) α(θ,θ′′) if θ′′ �= θ.
It is easy to check that p(θ | y) p(θ, θ′) = p(θ′ | y) p(θ′, θ), which, together with the irreducibility and aperiodicity of q(θ, θ′) on a suitable
state space, is a sufficient condition for p(θ | y) to be the equilibrium distribution of the constructed chain. It is important to observe that it is
sufficient to be able to evaluate the distribution p(θ | y) up to proportionality (given by the likelihood multiplied by the prior) as it appears in
(8) only through the ratio p(θ′ | y)/p(θ | y).
In the Gibbs sampling the transition probabilities are given by the full conditional posterior distributions of the parameters. In general,
if we denote by θ = (θ 1, θ 2, . . . , θ p) the parameter vector, the Gibbs sampler proceeds iteratively generating from the conditional posterior
distribution
θ(t+1)j ∼ p
(
θ j
∣
∣ θ(t+1)1 , . . . , θ
(t+1)j−1 , θ
(t)j+1, . . . , θ
(t)p , y
)
for j = 1, . . . , p; we note that in the Gibbs sampling the new generated value is always accepted.
C© 2007 The Authors, GJI, 169, 301–314
Journal compilation C© 2007 RAS