+ All documents
Home > Documents > Robust likelihood functions in Bayesian inference

Robust likelihood functions in Bayesian inference

Date post: 02-Dec-2023
Category:
Upload: unipd
View: 0 times
Download: 0 times
Share this document with a friend
13
Journal of Statistical Planning and Inference 138 (2008) 1258 – 1270 www.elsevier.com/locate/jspi Robust likelihood functions in Bayesian inference Luca Greco a , , Walter Racugno b , Laura Ventura c a PE.ME.IS. Department, University of Sannio, P.zza Arechi II, 82100 Benevento, Italy b Department of Mathematics, University of Cagliari, Italy c Department of Statistics, University of Padua, Italy Received 28 February 2007; received in revised form 26 April 2007; accepted 2 May 2007 Available online 22 May 2007 Abstract In order to deal with mild deviations from the assumed parametric model, we propose a procedure for accounting for model uncertainty in the Bayesian framework. In particular, in the derivation of posterior distributions, we discuss the use of robust pseudo-likelihoods, which offer the advantage of preventing the effects caused by model misspecifications, i.e. when the underlying distribution lies in a neighborhood of the assumed model. The influence functions of posterior summaries, such as the posterior mean, are investigated as well as the asymptotic properties of robust posterior distributions. Although the use of a pseudo-likelihood cannot be considered orthodox in the Bayesian perspective, it is shown that, also through some illustrative examples, how a robust pseudo-likelihood, with the same asymptotic properties of a genuine likelihood, can be useful in the inferential process in order to prevent the effects caused by model misspecifications. © 2007 Elsevier B.V.All rights reserved. Keywords: Estimating equation; Influence function; Kullback–Leibler divergence; Model misspecification; Pseudo-likelihood; Posterior distribution; Robustness 1. Introduction In the frequentist framework, it is well known that mild deviations from the assumed model can give rise to non- negligible changes in inferential results. The concept of robustness related to likelihood functions has been discussed under various points of views in the literature; see, for instance, Heritier and Ronchetti (1994), Tsou and Royall (1995), Lavine (1995) and Markatou et al. (1998). This issue is also addressed in the approach based on the influence function (IF) discussed in Huber (1981) and Hampel et al. (1986), in which unbiased estimating equations are claimed to supply effective tools for robust inference. In this context, robust pseudo-likelihoods can be derived from bounded estimating functions and can be used as genuine likelihoods. In particular, in this paper we focus our attention on the quasi- and the empirical likelihoods (see e.g. Owen, 2001; Adimari and Ventura, 2002a, b), which on the one hand keep the standard first-order properties, and on the other hand take into account model inadequacies. Indeed, they provide inferential procedures which are still reliable and reasonably efficient when the underlying distribution lies in a neighborhood of the assumed model. Corresponding author. E-mail address: [email protected] (L. Greco). 0378-3758/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2007.05.001
Transcript

Journal of Statistical Planning and Inference 138 (2008) 1258–1270www.elsevier.com/locate/jspi

Robust likelihood functions in Bayesian inference

Luca Grecoa,∗, Walter Racugnob, Laura Venturac

aPE.ME.IS. Department, University of Sannio, P.zza Arechi II, 82100 Benevento, ItalybDepartment of Mathematics, University of Cagliari, Italy

cDepartment of Statistics, University of Padua, Italy

Received 28 February 2007; received in revised form 26 April 2007; accepted 2 May 2007Available online 22 May 2007

Abstract

In order to deal with mild deviations from the assumed parametric model, we propose a procedure for accounting for modeluncertainty in the Bayesian framework. In particular, in the derivation of posterior distributions, we discuss the use of robustpseudo-likelihoods, which offer the advantage of preventing the effects caused by model misspecifications, i.e. when the underlyingdistribution lies in a neighborhood of the assumed model. The influence functions of posterior summaries, such as the posteriormean, are investigated as well as the asymptotic properties of robust posterior distributions. Although the use of a pseudo-likelihoodcannot be considered orthodox in the Bayesian perspective, it is shown that, also through some illustrative examples, how a robustpseudo-likelihood, with the same asymptotic properties of a genuine likelihood, can be useful in the inferential process in order toprevent the effects caused by model misspecifications.© 2007 Elsevier B.V. All rights reserved.

Keywords: Estimating equation; Influence function; Kullback–Leibler divergence; Model misspecification; Pseudo-likelihood; Posteriordistribution; Robustness

1. Introduction

In the frequentist framework, it is well known that mild deviations from the assumed model can give rise to non-negligible changes in inferential results. The concept of robustness related to likelihood functions has been discussedunder various points of views in the literature; see, for instance, Heritier and Ronchetti (1994), Tsou and Royall (1995),Lavine (1995) and Markatou et al. (1998). This issue is also addressed in the approach based on the influence function(IF) discussed in Huber (1981) and Hampel et al. (1986), in which unbiased estimating equations are claimed to supplyeffective tools for robust inference. In this context, robust pseudo-likelihoods can be derived from bounded estimatingfunctions and can be used as genuine likelihoods. In particular, in this paper we focus our attention on the quasi- and theempirical likelihoods (see e.g. Owen, 2001; Adimari and Ventura, 2002a, b), which on the one hand keep the standardfirst-order properties, and on the other hand take into account model inadequacies. Indeed, they provide inferentialprocedures which are still reliable and reasonably efficient when the underlying distribution lies in a neighborhood ofthe assumed model.

∗ Corresponding author.E-mail address: [email protected] (L. Greco).

0378-3758/$ - see front matter © 2007 Elsevier B.V. All rights reserved.doi:10.1016/j.jspi.2007.05.001

L. Greco et al. / Journal of Statistical Planning and Inference 138 (2008) 1258–1270 1259

Obviously, even in the Bayesian approach, inferential results depend on the modeling assumptions and on theobserved sample, but Bayesian robust statistic is mainly concerned with the global and local sensitivity to priordistributions; see, for instance, Berger (1994), Rios Insua and Ruggeri (2000) and references therein. Indeed, the problemof robustness with respect to sampling model misspecifications has been considered only in few contributions (Lavine,1991; Sivaganesan, 1993; Dey et al., 1996; Gustafson, 1996; Shyamalkumar, 2000). Other exceptions are discussedin Pericchi and Perez (1994), Pericchi and Sansò (1995), David (1973), O’Hagan (1979, 1988, 1990), Verdinelli andWasserman (1991), Passarin (2004) and Andrade and O’Hagan (2006), who address the topics of outliers, robustposterior summaries and model embedding.

The solution of embedding in a larger structure has the cost of eliciting a prior distribution for the extra parametersintroduced in the analysis. Moreover, the statistical procedures derived under the supermodel are not necessarily robustin a broad sense, the supermodel being too thin in the space of all distributions. In this paper, we focus on an alternativeapproach and we explore the use of the quasi- and the empirical likelihoods mentioned above. Actually, since theserobust pseudo-likelihoods share most of the properties of the genuine likelihood, they can be used as a basis forBayesian inference to obtain a “robust posterior distribution”. Although this way of proceeding on principle cannot beconsidered as orthodox in the Bayesian perspective, it is of interest to evaluate whether the use of a robust likelihoodmay be useful in the inferential process. Related works on the use of alternative likelihoods in Bayesian inference arein Efron (1993), Bertolino and Racugno (1992, 1994), Raftery et al. (1996) and Cabras et al. (2006). Papers which aremore specifically related to the validation of a pseudo-posterior distribution based on an alternative likelihood functionare Monahan and Boos (1992), Severini (1999), Lazar (2003), Racugno et al. (2005), Pace et al. (2006) and Schennach(2005).

We discuss two properties of robust posterior distributions, which are derived from the corresponding properties ofthe robust pseudo-likelihoods involved. First of all, we show that the robust posterior distributions are asymptoticallynormal as the genuine posterior distributions and that the asymptotic Kullback–Leibler divergence between a robustposterior distribution and the corresponding genuine posterior distribution is related to the required level of robustness.Moreover, to assess the stability of robust posterior inference, we show that the IF of summaries of the robust posteriordistributions are bounded, contrarily of those based on the genuine posterior distribution. A bounded IF is a desirablelocal stability property of an inferential procedure since it implies that, for each sample size, in a neighborhood of theassumed model the effect of a small contamination on the procedure cannot become arbitrarily large. The need forbounded influence statistical procedures for estimation, testing and prediction has been stressed by many authors (see,for instance, Hampel et al., 1986; Peracchi, 1990, 1991; Markatou and Ronchetti, 1997) and the aim of this paper is tocontribute to the current literature in the Bayesian direction.

A motivating example: Let us consider the well-known Darwin’s paired data (Spiegelhalter, 1985) on the heightsof self- and cross-fertilized plants. This example is discussed, in the Bayesian literature, also by Pericchi and Perez(1994) and Shyamalkumar (2000). It is pointed out that there is not enough evidence to distinguish between the normalmodel and a distribution with heavy tails, such as the Cauchy and the Student’s t distributions, the latter being slightlypreferred. Fig. 1(a) shows the normal probability plot of these data. Interest focuses on the location parameter of thedistribution of the difference of the heights.

By modeling likelihood uncertainty by the normal, Cauchy and Student’s t3 distributions and assuming the samedefault noninformative prior distribution for the location parameter, we obtain the posterior distributions plotted in Fig.1(b). Notice that posterior inferential results can depend strongly on the assumed model.

Even though simple, this example is interesting in its results since it illustrates in a practical situation the problemwe deal with. In order to handle with model uncertainty, we account for the imprecision with respect to the likelihoodby considering robust pseudo-likelihoods in the Bayesian practice. Their main appeal is that they allow to assume thesimpler sampling model and to deal anyhow with mild departures from it. The proposed approach leads to directlyembed model uncertainty in the likelihood function and, in view of this, to arrive at posterior inferential conclusionswhich prevent the effects caused by model misspecifications, i.e. when the underlying distribution lies in a neighborhoodof the simpler assumed model.

The paper is organized as follows. Section 2 presents a short overview of M-estimators, IF and quasi- and empiricallikelihood functions. Section 3 discusses robust posterior distributions, focusing on their asymptotic properties andon the computation of the IF of posterior summaries. Section 4 illustrates by explanatory examples and Monte Carlostudies the use of robust pseudo-likelihoods in the Bayesian framework. Finally, some conclusive remarks are given inSection 5.

1260 L. Greco et al. / Journal of Statistical Planning and Inference 138 (2008) 1258–1270

−1 0

−60

−40

−20

0

20

40

60

80

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Quantile

s

1 −10 0

0.00

0.01

0.02

0.03

0.04

0.05

0.06

10 20 30 40 50 60

a b

Fig. 1. Darwin’s data: normal probability plot and comparison of posterior distributions based on the normal (−), Cauchy (...) and Student’s t3 (−−)

distributions.

2. M-estimators, IF and quasi- and empirical likelihoods

Let y = (y1, . . . , yn) be a random sample of size n, having independent and identically distributed components,according to a distribution function F� = F(y; �), with � ∈ � ⊆ Rd , d �1. Let �� = �(y; �) =∑n

i=1�(yi; �) be anunbiased estimating function for � based on y, such that E�(�(Y ; �)) = 0 for every �. For some purposes, it is enoughthat the latter expectation is O(1) as n → ∞. A general M-estimator is defined as the root � of the estimating equation

�� = 0. (1)

It is worth mentioning that the class of M-estimators is rich and includes a variety of well-known estimators. Forexample, if �� is the likelihood score function, we obtain the maximum likelihood estimator (MLE).

Under broad conditions assumed throughout this paper, an M-estimator, as the MLE within a parametric model, isconsistent and approximately normal with mean � and variance

V (�) = B(�)−1�(�)(B(�)−1)T, (2)

where B(�) = −E�(���/��T) and �(�) = E�(���T� ). The same ingredients appearing in (2) also appear in the well-

known robust sandwich estimate of the asymptotic variance of the MLE (Huber, 1967), with �� = ��(�)/��T; seeRoyall (1986) for detailed examples.

One way of assessing the robustness of a statistic T which can be represented (at least asymptotically) as a functionof the empirical distribution function Fn, that is, T (Fn), is by means of the IF given by

IF(x; T , F�) = lim�→0

T (F�) − T (F�)

�= �T (F�)

��

∣∣∣∣�=0

, (3)

where T (F�)=T ((1−�)F�+��x), with �x point mass in x. The IF measures the local stability of a statistical procedure,since it describes the effect of a small contamination (��x) at the point x, standardized by the mass of the contamina-tion. As a consequence, the linear approximation �IF(x; T , F�) describes the asymptotic bias of T under single-pointcontaminations of the assumed model distribution F�. The supremum of the IF, i.e. the gross-error sensitivity, measuresthe worst influence on T and a desirable robustness property for a statistical procedure is that the gross-error sensitivityis finite, i.e. that the IF is bounded (B-robustness). Note that the IF of the MLE is proportional to its score functionand, in general, MLEs have unbounded IF. On the contrary, if �(x; �) is bounded then the corresponding M-estimator� is B-robust (Hampel et al., 1986). Indeed, the IF is given by

IF(x; �, F�) = B(�)−1�(x; �). (4)

L. Greco et al. / Journal of Statistical Planning and Inference 138 (2008) 1258–1270 1261

Finally, notice that the IF can also be used to evaluate the asymptotic covariance matrix ofT, sinceV (�)=E�(IF(x; T , F�)

IF(x; T , F�)T).

Robust pseudo-likelihoods may be derived from the estimating equation (1) that define an M-estimator and theirrobustness properties are driven from the corresponding M-estimator (Heritier and Ronchetti, 1994). A first possiblerobust pseudo-likelihood obtained from the estimating function �� is the quasi-likelihood (see McCullagh, 1991;Adimari and Ventura, 2002a, b), defined as

LQ(�) = exp(�Q(�)) = exp

(∫ �

k

A(t)�(y; t) dt

),

where the matrix A(�) is such that A(�)T = �(�)−1B(�) and k is an arbitrary constant. When d = 1, LQ(�) is usuallyeasy to derive, and when d > 1 it exists if and only if the matrix B(�) is symmetric. When LQ(�) exists, it may beused, in a standard way, for setting quasi-likelihood tests and confidence regions, since matrix A(�) allows to obtaina quasi-likelihood ratio statistic WQ(�) = 2{�Q(�) − �Q(�)} with a standard �2

d distribution. Moreover, the robustness

properties of � do not depend on A(�) because this matrix does not change its IF.A different robust pseudo-likelihood for � can be obtained by computing the empirical likelihood from ��. In fact,

the empirical likelihood LE(�) (Owen, 2001) is a nonparametric tool which can be derived in several contexts, and inparticular for parameters which are determined by estimating equations (Monti and Ronchetti, 1993; Adimari, 1997;Tsao and Zhou, 2001). Although in its natural setting the empirical likelihood does not require the assumption of aparametric model F�, when based on unbiased estimating equations that define M-estimators related to F�, it maybecome such as a parametric tool. Its main appeal is that only unbiasedness of �� is required to obtain a standardasymptotic �2

d distribution for the empirical likelihood ratio statistic. On the other hand, due to its nonparametricnature, LE(�) fails when the sample size is very small (see also Adimari and Ventura, 2002b). The empirical likelihoodratio statistic is given by WE(�) = −2 log RE(�), with RE(�) = maxpi

∏ni=1npi, where the pi-weights satisfy pi �0,∑n

i=1pi = 1 and∑n

i=1�(yi; �)pi = 0. A Lagrangian argument leads to

WE(�) = 2n∑

i=1

log{1 + T�(yi; �)}

if �=0 is inside the convex hull of �(y1; �), . . . ,�(yn, �); otherwise, it is adequate to set WE(�)=+∞. The Lagrangianmultiplier =(�) satisfies

∑ni=1�(yi; �)/(1+T�(yi; �))=0. Finally, note that, under standard regularity conditions,

it can be shown that WQ(�) and WE(�) are equivalent to the first term of their Taylor expansions.

3. Robust posterior distributions

By introducing a general pseudo-likelihood function LPS(�) in the Bayesian analysis, a pseudo-posterior distributioncan be defined as PS(�; y) ∝ (�)LPS(�), where (�) denotes a default prior distribution. The properties of robustposterior distributions of the form

Q(�; y) ∝ (�)LQ(�) or E(�; y) ∝ (�)LE(�) (5)

are driven from the used pseudo-likelihood. In general, the properties of a pseudo-likelihood LPS(�) are related to theway it is derived. Two different situations for the construction of LPS(�) need to be distinguished. In the first case, thepseudo-likelihood is based on a statistical model defined as a reduction of the original model, so that the obtained LPS(�)

is a genuine likelihood function. On the contrary, when such a reduced model does not exist, all the properties have tobe specifically investigated: this is the case of quasi- and empirical likelihoods. However, it has been shown (see, forinstance, Owen, 2001; Pace and Salvan, 1997, Chapter 4;Adimari and Ventura, 2002a; Lazar, 2003) that both LQ(�) andLE(�) have all the desired standard first-order properties which make them resemble a true likelihood and thus usefulas a basis for Bayesian inference. In particular, the pseudo-MLE �, solution of (1), is consistent, asymptotically normaland asymptotically efficient, the pseudo-score function �� has mean zero and is asymptotically normally distributedand the pseudo-likelihood ratio statistic has an asymptotic chi-squared distribution.

1262 L. Greco et al. / Journal of Statistical Planning and Inference 138 (2008) 1258–1270

3.1. Asymptotic properties of robust posterior distributions

The study of the asymptotic properties of a pseudo-posterior distribution associated to (1) follows essentially theroute used for the proper posterior distribution, i.e. the special case where �� is the score function (see e.g. Bernardoand Smith, 2000, Section 5.3). Under regularity conditions, the pseudo-posterior distribution can be written as

PS(�; y) ∝ exp(log (�) + �PS(�))

with �PS(�)= log LPS(�). Expanding the two logarithms in a Taylor series about their respective maxima, i.e. the priormode �0 and �, respectively, we obtain

log (�) = log (�0) − 12 (� − �0)

TH−10 (� − �0) + R0,

�PS(�) = �PS(�) − 12 (� − �)TH(�)−1(� − �) + Rn,

where

H−10 = −(�2 log (�))/(��T��)|�=�0 ,

H(�)−1 = −(�2�PS(�))/(��T��)|�=�

and R0 and Rn are remainder terms, which can be assumed small for large n. It follows, ignoring terms not dependingon �, that, for large n,

PS(�; y) ∝ exp(− 12 (� − �0)

TH−10 (� − �0) − 1

2 (� − �)TH(�)−1(� − �))

∝ exp(− 12 (� − �m)TH−1

m (� − �m)) (6)

with H−1m =H−1

0 +H(�)−1 and �m =Hm(H−10 �0 +H(�)−1�). Development (6) suggests that robust pseudo-posterior

distributions of form (5) are asymptotically normal with mean �m, which is a weighted average of a prior estimateand a robust observation-based estimate, and precision H−1

m , which is the sum of the prior precision matrix and of thepseudo-information matrix. Notice that, for large n, �m/� → 1 and the precision provided by the data tends to be largecompared with the prior precision, which turns negligible and hence �m and H−1

m are essentially indistinguishable from� and H(�)−1 (see e.g. Bernardo and Smith, 2000, Section 5.3). Moreover, since H(�)−1 =V −1(�)+ op(1), we might

approximate (5) by a multivariate normal distribution with mean � and variance V (�).The asymptotic approximations of the proper and the pseudo-posterior distributions allow to easily compute the

asymptotic Kullback–Leibler divergence between (�; y) and PS(�; y). Consider for simplicity d=1, but the followingresults can be extended to the more general situation of d > 1. Using (�; y) ∼ N(�MLE, i(�)−1), with i(�) informationmatrix and �MLE MLE of �, and using PS(�; y) ∼ N(�, V (�)), it can be shown that

KL = KL((�; y); PS(�; y))

=∫

log(�; y)

PS(�; y)(�; y) d�

= 1

2

[− log e − 1 + e + e

(�MLE − �)2

i(�)−1

], (7)

where e = i(�)−1/V (�) = [i(�)V (�)]−1 is the asymptotic efficiency of the estimator � (see e.g. Hampel et al., 1986,Chapter 2). As it is well known, in the classical robustness theory, to build a B-robust estimator, we have to put abound on its IF. However, doing that leads to an efficiency loss at the assumed model, and therefore it is necessary toestablish a trade-off between robustness and efficiency. Typically, an usual choice is to fix an efficiency loss of, say, 5%under the assumed parametric model and to consider the consequent bound on the IF. This means that, for example,the robust estimator � can be built to achieve a 95% efficiency at the model, i.e. e = 0.95, and in this situation the firstterm [− log e − 1 + e] in (7) reduces to 0.00129. Furthermore, it is interesting to note that the last term in (7) is relatedto the extension of the Mahalanobis’s distance between two normal populations with proportional variances (see e.g.

L. Greco et al. / Journal of Statistical Planning and Inference 138 (2008) 1258–1270 1263

Paranjpe and Gore, 1994). This term is of order O(1) and its value will depend on the specific assumed model and onthe assumed robust estimator (see Section 4). In view of this, the asymptotic Kullback–Leibler divergence between(�; y) and PS(�; y) turns out to be defined by the required level of robustness.

3.2. IF of posterior summaries

In this section we investigate the IF of the posterior moments. To this end, first of all, we need to derive the IF of theposterior mode, which is easy to compute, and to use this result to compute the IF of posterior moments. By consideringlog PS(�; y), the mode � of the pseudo-posterior distribution is the solution of the estimating equation

� log PS(�; y)

��= ′(�) + ��PS(�)

��= 0, (8)

where ′(�) = � log (�)/��, which is of form (1). Eq. (8) can be written as ′(�) + A(�)�(y; �) = 0 or, equivalently,as

n∑i=1

A(�)[g(�) + �(yi; �)] = 0. (9)

From (9), for the quasi-loglikelihood function we have g(�) = A(�)−1′(�)/n, whereas for the empirical likelihoodwe have g(�) = ′(�)/n since A(�) = Id , where Id is the identity matrix. Finally, for the genuine likelihood we haveA(�) = Id , g(�) = ′(�)/n and �(yi; �) = ��(�; yi)/��, for i = 1, . . . , n.

It is easy to compute the IF for the posterior mode � since, in view of (4), it is simply proportional to the estimatingequation (9). In particular, for the posterior mode � the IF at a point x is given by

IF(x; �, F�) ∝ A(�)[g(�) + �(x; �)]. (10)

As a consequence, we can see that the IF of a posterior mode is bounded if and only if the function �(·; �) is bounded.This is true if a robust likelihood function is used in (5), while in general this is not true if the genuine likelihood isconsidered.

Let us now consider moments of posterior distributions of form (5) given by T� = ∫h(�) exp(log PS(�; y)) d�. For

instance, if h(�) = �, T� is the posterior mean EPS(�; y). To compute T�, we use a simple approximation based onLaplace’s method (Tierney and Kadane, 1986), which provides T� =h(�)(1+R(�)), where � is the mode of PS(�; y).

The remaining term R(�), of order O(n−1), can be expressed as a function of derivatives of h(�) and PS(�; y) evaluatedat �. The IF of a posterior moment can then be computed simply using approximation T�. In particular, denoting by ��the estimator computed with respect to the contaminated model (1 − �)F� + ��x , we obtain

IF(x; T�, F�) =�T��

��

∣∣∣∣∣�=0

= �[h(��)(1 + R(��))]��

∣∣∣∣∣�=0

=(

���

��

)[(�h(��)

���

)(1 + R(��)) + h(��)

(�R(��)

���

)]∣∣∣∣∣�=0

∝ IF(x; �, F�). (11)

In view of (10) and (11), a posterior moment has bounded IF if and only if the posterior mode has bounded IF. Theseresults state that the summaries of the robust posterior distributions are stable with respect to small contaminations andthus the proposed procedure prevents the effects caused by model misspecifications, outliers or influential observations.On the contrary, for the genuine posterior distribution, small deviations from the assumed model can have strong effectson posterior summaries. Finally, it is interesting to note that in this context the robustness of the inferential conclusions

1264 L. Greco et al. / Journal of Statistical Planning and Inference 138 (2008) 1258–1270

does not concern the posterior summary considered, but it depends on the likelihood function used in the posteriordistribution.

4. Examples and Monte Carlo studies

In this section we consider some examples in order to investigate the finite-sample behavior of the inferentialprocedures based on robust posterior distributions (5). Interest is focused on location and scale models. Although theexamples discussed here are purely illustrative, the construction of (5) can be done also for more complex situationsand in the presence of nuisance parameters, without particular efforts.

In Examples 4.2 and 4.3 we perform Monte Carlo experiments whose objective is twofold. First, to evaluate thediscrepancy between the robust posterior distributions and the corresponding genuine posterior distribution in terms ofthe Kullback–Leibler divergence. Second, to assess the accuracy and stability of robust posterior summaries both whenthe model is correctly specified and under arbitrary departures from the assumed model. We consider �-neighborhoodcontamination models of the form F� = (1 − �)F + �H , where H(·) denotes the contaminating distribution and � is thecontamination percentage. We set � = 0.10 and performed 1000 trials for each simulation.

4.1. A motivating example (continued)

Let us consider the performance of (5) for Darwin’s data. The quantity of interest is the location parameter of thedistribution of the difference of the heights and the quasi- and empirical likelihoods can be derived from a classicalHuber-type estimating function (see Hampel et al., 1986, Chapter 2), given by

�(y; �) = (y − �) min(1, k/|y − �|), (12)

where � is a location parameter and k > 0 is a fixed constant, which can be interpreted as the regulator betweenrobustness and efficiency: for a lower k one gains robustness but loses efficiency and vice versa. Typically, k is chosento achieve 95% efficiency at the assumed model, i.e. e = 0.95.

Table 1 gives the values of the posterior mean and of the 0.95 higher posterior density interval (HPD) for the locationparameter derived from the proper posterior distributions based on the normal, Cauchy and Student’s t3 models,and also from the quasi- and empirical posterior distributions based on the simpler normal model. All the posteriordistributions are displayed in Fig. 2, together with the plots of the minus twice normalized loglikelihoods. In this case,the quasi-loglikelihood function for � is given by

�Q(�) = A�n∑

i=1

yiI(yi−k ���yi+k) − A�2

2

n∑i=1

I(yi−k ���yi+k)

+ A

n∑i=1

(k� + c1i )I(�<yi−k) − A

n∑i=1

(k� + c2i )I(�>yi+k), (13)

where I(·) denotes the indicator function, c1i = (yi − k)2/2 and c2i = −(yi + k)2/2, i = 1, . . . , n, are such that �Q(�)

is continuous and A = [�(k) − �(−k)]/[2(k2�(−k) − k�(k) + �(k) − 1/2)], with �(·) and �(·) standard normaldistribution and density, respectively. On the contrary, a closed form expression for the empirical likelihood is notavailable.

Table 1Darwin’s data: posterior mean and 0.95 HPD for the location parameter

Normal t3 Cauchy Q(�; y) E(�; y)

Mean 20.58 26.27 26.27 25.74 22.69HPD (7.93,33.23) (11.16,41.15) (10.86,41.72) (12.09,38.97) (0.23,42.04)

L. Greco et al. / Journal of Statistical Planning and Inference 138 (2008) 1258–1270 1265

−20 0

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Normalt3

Cauchy

Quasi

Empirical

20 40 60 −20 0

8

No

rma

lize

d lo

g−l

ike

liho

od

Normal

t3

CauchyQuasi

Empirical

6

4

2

0

20 40 60

a b

Fig. 2. Darwin’s data: comparison of posterior distributions and of minus twice normalized loglikelihood functions.

From Table 1 and Fig. 2 it can be noticed that the posterior means of Q(�; y) and E(�; y) show their robustnessproperties since they are very similar to the means of the proper posterior distributions based on the t3 and Cauchymodels. On the contrary, the posterior mean of (�; y) based on the normal model is shifted to the left, since it is affectedby the two smaller values of the data set. Indeed, in this case, the IF of the posterior mean of (�; y) is proportionalto IF(x; T�, F�) ∝ (x − �) (see also Example 4.2). Moreover, it is important to notice that the HPDs associated tothe proper posterior distributions based on the t3 and Cauchy models are larger than the one associated to Q(�; y),which avoids using a heavy tails sampling model. This does not happen when using E(�; y), which shows a markableasymmetry to the left. The reason is that a characteristic of LE(�) is that it takes into account the configuration of thedata and the sample size.

4.2. Example: location model

Let us consider n observations drawn from a normal population N(�, 1), assuming as prior the conjugate distribution(�) ∼ N(�0, 2), with �0 and 2 fixed. The posterior mean and the posterior mode are equal to E(�; y) = (ny 2 +�0)/(1 + n 2), which has clearly unbounded IF since the latter is proportional to IF(x; T�, F�) ∝ (x − �). If we writey = (n − 1)y(n−1)/n + yn/n, where y(n−1) is the sample mean computed on (n − 1) observations, then

E(�; y) = yn 2

1 + n 2 + (n − 1)y(n−1) 2 + �0

1 + n 2 . (14)

By allowing yn to vary, the posterior mean (14) is largely affected for each assumed values of �0 and 2, since oneobservation may have unbounded influence on E(�; y). On the contrary, if we consider Q(�; y) or E(�; y) derivedfrom a classical Huber-type estimating function of form (12), then the IF of the posterior mean is proportional toIF(x; T�, F�) ∝ �(x; �) which is clearly bounded.

Sensitivity analysis: A plot of (14) as a function of yn yields the empirical IF of the posterior mean. This functionis also known as the sensitivity curve (Hampel et al., 1986, Chapter 2; see also Passarin, 2004, for an application inthe Bayesian setting), which in general enables us to understand if model assumptions can become inadequate. Toillustrate the use of the sensitivity curve to compare (�; y), Q(�; y) and E(�; y), let us consider a sample of n = 10observations drawn from an N(�, 1), with � = 0, and the prior (�) ∼ N(0.5, 1). The largest observation is allowedto vary in the range [−5, 5]. For each sample Q(�; y) and E(�; y), based on (12) with k = 1.345, and (�; y) havebeen computed. Attention is focused on twice the natural logarithm of the Bayes factors to choose among the simplehypotheses H0 : � = 0.5 versus H1 : � = 0 (Fig. 3). Clearly, one observation may have unbounded influence on theclassical Bayes factor B01 based on the genuine likelihood: evidence against H1 increases as one observation becomeslarger, whereas, moving to the left, data support H1 with growing strength. On the contrary, the use of LQ(�) or ofLE(�) in the evaluation of the Bayes factors, on the one hand, leads to reject H0 in each case, as twice the natural

1266 L. Greco et al. / Journal of Statistical Planning and Inference 138 (2008) 1258–1270

−4 −2 0

−6

−4

−2

0

4

2

2 4

Fig. 3. Sensitivity curves related to twice the natural logarithm of the Bayes factors evaluated from (�; y) (−), Q(�; y) (∗) and E(�; y) (◦).

Table 2Location model: mean (and sd) of KL((�; y);Q(�; y)), KL((�; y);E(�; y)) and (7)

n KL((�; y);Q(�; y)) KL((�; y);E(�; y)) (7)

5 0.02 0.17 0.02(0.04) (0.15) (0.06)

10 0.02 0.12 0.02(0.04) (0.21) (0.05)

50 0.02 0.02 0.02(0.03) (0.03) (0.04)

The sampling model is N(0, 1); the prior distribution is N(0.5, 1).

logarithm of the pseudo-Bayes factors is less than 2, and, on the other hand, provides us with a lower and an upperbound when measuring evidence of H1 with respect to H0.

Kullback–Leibler divergence: The aim is to evaluate the discrepancy between the pseudo-posterior distributionsQ(�; y) and E(�; y) and the proper posterior distribution (�; y), in terms of the finite-sample expression of theKullback–Leibler divergence and of its asymptotic expression (7). To this end a Monte Carlo study has been performedwith n=5, 10, 50. The results in Table 2 show the closeness of (�; y) and the robust posterior distributions Q(�; y) andE(�; y), since the estimated values of the Kullback–Leibler divergences are quite small. Note that the results in Table 2confirm the asymptotic theoretical result of formula (7), since the estimated Kullback–Leibler divergences converge tothe asymptotic value. Here, we report the results obtained for a single elicited prior distribution, but the same behaviorhas been observed for different prior distributions. It is worth noting that the distribution of KL((�; y); E(�; y)) isnon-negligibly skewed for n = 5, 10 and several extreme values took place; indeed, the empirical posterior takes intoaccount the configuration of the data and asymmetries arise in E(�; y) (see also Lazar, 2003).

Stability of robust posterior summaries: The objective is to compare the finite-sample accuracy of the inferentialprocedures based on posterior distributions of form (5) with those based on the corresponding genuine posteriordistribution. The entries in Table 3 show that Q(�; y) and E(�; y) can be used successfully to obtain robust (in afrequentist sense) posterior summaries both when the model is correctly specified and when the true sampling modellies in an �-neighborhood of the assumed model. Indeed, the expected value of the genuine posterior distribution (�; y)

behaves unsatisfactorily under contamination in terms of bias or variability, while the posterior summaries of Q(�; y)

and E(�; y) are more stable and less affected by departures of the data from the specified model. Their robustness

L. Greco et al. / Journal of Statistical Planning and Inference 138 (2008) 1258–1270 1267

Table 3Location model: mean (and sd) of posterior means under different scenario

n E(�; y) EQ (�; y) EE (�; y) ModeE (�; y)

5 0.08 0.08 0.06 0.08(0.37) (0.37) (0.40) (0.39)

10 0.05 0.05 0.06 0.05(0.29) (0.29) (0.29) (0.30)

50 0.02 0.01 0.02 0.01(0.13) (0.13) (0.13) (0.13)

H = N(0, 102)

5 0.06 0.07 0.08 0.06(0.95) (0.42) (0.46) (0.42)

10 0.04 0.05 0.12 0.07(0.97) (0.34) (0.43) (0.30)

50 0.03 0.01 0.09 0.02(0.46) (0.16) (0.23) (0.16)

H = N(10, 1)

5 0.94 0.26 0.36 0.28(1.52) (0.46) (0.49) (0.43)

10 0.96 0.21 0.59 0.24(0.29) (0.31) (0.18) (0.27)

50 0.99 0.19 0.35 0.20(0.13) (0.14) (0.13) (0.14)

The sampling model is N(0, 1); the prior distribution is N(0.5, 1).

properties are driven from the properties of the estimating function used to obtain the involved pseudo-likelihoodfunction.

Validity of Bayesian inference: Finally, we discuss the validity for (5) using a criterion for evaluating whether ornot an alternative likelihood is proper for Bayesian inference (Monahan and Boos, 1992). These authors introducea definition of validity, based on the coverage properties of posterior sets. In practice, they compute the statistic

H = ∫ �−∞ PS(t; y) dt , which corresponds to posterior coverage set functions of the form (−∞, t�], where t� is the

�th percentile of a pseudo-posterior distribution. They assume that PS(�; y) is valid by coverage if H is uniformlydistributed in (0, 1).

Validity of Bayesian inference for the empirical likelihood was assessed in Lazar (2003), and for the quasi-likelihood isinvestigated here. In particular, paralleling Lazar (2003) for the empirical likelihood in a location model, we generatedvalues of the mean � from two normal priors, i.e. N(0, 0.52) and N(0, 52). Then, observations (y1, . . . , yn) weregenerated from an N(�, 1), for n=5, 10, 20, 50. For each n, this procedure was repeated 1000 times, and we calculatedthe statistic H in each case, using the quasi-likelihood function. The normal probability plots for n=5, 10, 20, 50 and thetwo normal priors N(0, 0.52) and N(0, 52) are given in Fig. 4. In these cases the values of the Kolmogorov–Smirnov testsfor uniformity are always larger than 0.5 for each value of n. We conclude that, contrarily to the empirical likelihood,that requires n�50, quasi-likelihood with a conjugate prior gives valid Bayesian inference, for each value of the samplesize n.

4.3. Example: scale model

Consider a sample of n observations drawn from an exponential model with F� = 1 − exp(−�y), assumed to bethe sampling model. Let us focus on the optimal Hampel estimator (see Hampel et al., 1986, Chapter 2), based on the

1268 L. Greco et al. / Journal of Statistical Planning and Inference 138 (2008) 1258–1270

0.0 0.4 0.8

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.4 0.8

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.4 0.8

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.4 0.8

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.4 0.8

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.4 0.8

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.4 0.8

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.4 0.8

0.0

0.2

0.4

0.6

0.8

1.0

Fig. 4. Uniform probability plots to check the validity of the quasi-likelihood function.

estimating equation

�(y; �) = max(−b, min(b, a − �y)), (15)

for appropriate constants a and b. From (15) the quasi- and the empirical likelihoods are very easy to derive; in particular,a closed form expression for LQ(�) is discussed in Adimari and Ventura (2002b).

Assume that the prior density of the parameter � is taken as the Jeffrey’s prior distribution (�) ∝ �−1, that is theusual noninformative prior for a scale parameter. In this case, (�; y) is a gamma distribution with shape parametern and scale parameter s = ny. Thus, the posterior mean is given by E(�; y) = 1/y, which has clearly unboundedIF. The same result holds also for the posterior mode � = (n − 1)/(ny). On the contrary, if we consider Q(�; y) orE(�; y) derived from (15), then the IF of the robust posterior mean or of the robust posterior mode is proportional toIF(x; T�, F�) ∝ �(�; x) which is clearly bounded.

Kullback–Leibler divergence: The aim is to evaluate the discrepancy between Q(�; y) and E(�; y) and the properposterior distribution (�; y). Data are generated from an exponential model with mean 1 (Exp(1)). The results inTable 4 show the closeness of (�; y) and the robust posterior distributions, and confirm the asymptotic theoreticalresult of formula (7). In this case, the divergences increased on average with respect to the location model and so theirvariability. This is due to a more markable asymmetry in their empirical distribution. Moreover, in this case Q(�; y)

seems to be close to (�; y), although E(�; y) seems to be preferable. Indeed, E(�; y) presents smaller values of thefinite-sample Kullback–Leibler divergence.

Stability of robust posterior summaries: To asses the accuracy and stability of robust posterior summaries, we comparethe results obtained under the true model Exp(1) and also when the true sampling model lies in an �-neighborhoodof the assumed model. As in Example 4.2, from Table 5 we can note that both Q(�; y) and E(�; y) can be used

L. Greco et al. / Journal of Statistical Planning and Inference 138 (2008) 1258–1270 1269

Table 4Scale model: mean (and sd) of KL((�; y);Q(�; y)), KL((�; y);E(�; y)) and (7)

n KL((�; y),Q(�; y)) KL((�; y),E(�; y)) (7)

10 0.12 0.08 0.06(0.13) (0.19) (0.07)

20 0.11 0.08 0.06(0.12) (0.08) (0.07)

50 0.09 0.07 0.05(0.12) (0.07) (0.06)

The sampling model is an exponential model with � = 1; the prior distribution is Jeffrey’s prior.

Table 5Scale model: mean (and sd) of posterior means under different scenario

n True model H = Weibull(1.5, 10)

E(�; y) EQ (�; y) EE (�; y) E(�; y) EQ (�; y) EE (�; y)

5 0.96 1.03 1.11 0.78 0.92 1.15(0.24) (0.43) (0.79) (0.43) (0.48) (0.61)

10 1.05 1.05 1.15 0.72 0.87 1.19(0.26) (0.39) (0.75) (0.40) (0.39) (0.67)

50 1.02 1.01 1.06 0.78 0.94 1.13(0.15) (0.17) (0.16) (0.13) (0.16) (0.17)

The sampling model is an exponential model with � = 1; the prior distribution is Jeffrey’s prior.

successfully to construct robust (in a frequentist sense) posterior summaries for the parameter of interest, since theyprovide posterior summaries not affected by the contamination, as it happens for the genuine posterior summaries.

5. Final remarks

Quasi-likelihood and empirical likelihood represent two approaches for constructing robust posterior distributions,which take into account sampling model misspecifications. The examples and simulation results discussed in the formersection show that both Q(�; y) and E(�; y) can be used successfully to make robust Bayesian inference. Generally,our experience, based on the examples discussed in Section 4 and other examples not reported here, indicates that, onthe contrary of Q(�; y), E(�; y) is not computable for very small values of n and for moderate sample sizes appearsto have always heavier tails. For large values of n, Q(�; y) and E(�; y) are equivalent and in this case the choicebetween LQ(�) and LE(�) may depend on the problems related to their computation, and on the assumptions made onthe model F�.

The examples presented in the previous section are very simple and concern models indexed by scalar parameters.However, the robust pseudo-likelihoods can be computed also for more complex situations, such as generalized linearmodels, and also to deal with nuisance parameters. In fact, also their profile versions are available and have the samerobustness properties.

Finally, the sensitivity of Q(�; y) and E(�; y) with respect to prior specifications requires more investigation. Inparticular, it would be interesting to investigate the relationship between Bayesian robustness with likelihood robustness,in order to understand how these two requirements can counterbalance or can be linked. A possible starting point couldbe expression (6).

References

Adimari, G., 1997. Empirical likelihood type confidence intervals under random censorship. Ann. Inst. Statist. Math. 49, 447–466.

1270 L. Greco et al. / Journal of Statistical Planning and Inference 138 (2008) 1258–1270

Adimari, G., Ventura, L., 2002a. Quasi-profile loglikelihood for unbiased estimating functions. Ann. Inst. Statist. Math. 54, 235–244.Adimari, G., Ventura, L., 2002b. Quasi-likelihood from M-estimators: a numerical comparison with empirical likelihood. Statist. Methods Appl. 11,

175–185.Andrade, J.A.A., O’Hagan, A., 2006. Bayesian robustness modeling using regularly varying distributions. Bayesian Anal. 1, 169–188.Berger, J.O., 1994. An overview of robust Bayesian analysis (with discussion). Test 3, 5–124.Bernardo, J.M., Smith, A.F.M., 2000. Bayesian Theory. Wiley Series in Probability and Statistics, New York.Bertolino, F., Racugno, W., 1992. Analysis of the linear correlation coefficient using pseudo-likelihoods. J. Italian Statist. Soc. 1, 33–50.Bertolino, F., Racugno, W., 1994. Robust Bayesian analysis of variance and the �2-test by using marginal likelihoods. Statistician 43, 191–201.Cabras, S., Racugno, W., Ventura, L., 2006. Bayesian inference on the scalar skew-normal distribution. In: Proceedings of COMPSTAT 2006, 17th

Symposium of IASC-ERS, pp. 1373–1380.David, A.P., 1973. Posterior expectations for large observations. Biometrika 60, 664–667.Dey, D.K., Ghosh, S.H., Lou, K., 1996. On local sensitivity measures in Bayesian analysis (with discussion). In: Bayesian Robustness, IMS Lecture

Notes—Monograph Series, vol. 29.Efron, B., 1993. Bayes and likelihood calculations from confidence intervals. Biometrika 80, 3–26.Gustafson, P., 1996. Model influence functions based on mixtures. Canad. J. Statist. 24, 535–548.Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A., 1986. Robust Statistics. The Approach Based on Influence Functions. Wiley, New

York.Heritier, S., Ronchetti, E., 1994. Robust bounded influence tests in general parametric models. J. Amer. Statist. Assoc. 89, 897–904.Huber, P.J., 1967. The behavior of maximum likelihood estimates under nonstandard conditions. In: Proceedings of the Fifth Berkeley Symposium

on Mathematical Statistics and Probability, vol. I, pp. 221–233.Huber, P.J., 1981. Robust Statistics. Wiley, New York.Lavine, M., 1991. Sensitivity in Bayesian statistics: the prior and the likelihood. J. Amer. Statist. Assoc. 86, 396–399.Lavine, M., 1995. On an approximate likelihood for quantiles. Biometrika 82, 220–222.Lazar, N.A., 2003. Bayesian empirical likelihood. Biometrika 90, 319–326.Markatou, M., Ronchetti, E., 1997. Robust inference: the approach based on influence functions. Handbook of Statistics 15, 49–75.Markatou, M., Basu, A., Lindsay, B.G., 1998. Weighted likelihood equations with bootstrap root search. J. Amer. Statist. Assoc. 93, 740–750.McCullagh, P., 1991. Quasi-likelihood and estimating functions. In: Hinkley, D.V., Reid, N., Snell, E.J., (Eds.), Statistical Theory and Modelling.

pp. 265–286.Monahan, J.F., Boos, D.D., 1992. Proper likelihoods for Bayesian analysis. Biometrika 79, 271–278.Monti, A.C., Ronchetti, E., 1993. On the relationship between empirical likelihood and empirical saddlepoint approximation for multivariate

M-estimators. Biometrika 80, 329–338.O’Hagan, A., 1979. On outlier rejection phenomena in Bayes inference. J. Roy. Statist. Soc. B 41, 258–367.O’Hagan, A., 1988. Modelling with heavy tails. Bayesian Statist. 3, 345–359.O’Hagan, A., 1990. Outliers and credence for location parameter inference. J. Amer. Statist. Assoc. 85, 172–176.Owen, A.B., 2001. Empirical Likelihood. Chapman & Hall, London.Pace, L., Salvan, A., 1997. Principles of Statistical Inference from a Neo-Fisherian Perspective. World Scientific, Singapore.Pace, L., Salvan, A., Ventura, L., 2006. Likelihood based discrimination between separate scale and regression models. J. Statist. Plann. Inference

136, 3539–3553.Paranjpe, S.A., Gore, A.P., 1994. Selecting variables for discrimination when covariance matrices are unequal. Statist. Probab. Lett. 21, 417–419.Passarin, K., 2004. Robust Bayesian estimation. Working Paper, 2004.11, University of Insubria, Italy.Peracchi, F., 1990. Robust M-estimators. Econometric Rev. 9, 1–30.Peracchi, F., 1991. Robust M-tests. Econometric Theory 7, 69–84.Pericchi, L.R., Perez, M.E., 1994. Posterior robustness with more than one sampling model. J. Statist. Plann. Inference 40, 279–294.Pericchi, L.R., Sansò, B., 1995. A note on bounded influence in Bayesian analysis. Biometrika 82, 223–225.Racugno, W., Salvan, A., Ventura, L., 2005. On the use of pseudo-likelihoods in Bayesian variable selection. Working Paper, 2005.12, Department

of Statistics, University of Padua, Italy.Raftery, A.E., Madigan, D., Volinsky, C.T., 1996. Accounting for model uncertainty in survival analysis improves predictive performance. Bayesian

Statist. 323–349.Rios Insua, D., Ruggeri, F., 2000. Robust Bayesian Analysis. Springer, New York.Royall, R., 1986. Model robust confidence intervals using maximum likelihood estimators. Internat. Statist. Rev. 54, 221–226.Schennach, S.M., 2005. Bayesian exponentially tilted empirical likelihood. Biometrika 92, 31–46.Severini, T.A., 1999. On the relationship between Bayesian and non-Bayesian elimination of nuisance parameters. Statist. Sinica 9, 713–724.Shyamalkumar, N.D., 2000. Likelihood Robustness. In: Robust Bayesian Analysis.Springer, New York.Sivaganesan, S., 1993. Robust Bayesian diagnostic. J. Statist. Plann. Inference 35, 171–188.Spiegelhalter, D.J., 1985. Exact Bayesian inference on the parameters of a Cauchy distribution with vague prior information. Bayesian Statist. 2,

743–750.Tierney, L., Kadane, J.B., 1986. Accurate approximations for posterior moments and marginal densities. J. Amer. Statist. Assoc. 81, 82–86.Tsao, M., Zhou, J., 2001. On the robustness of empirical likelihood ratio confidence intervals for location. Canad. J. Statist. 29, 129–140.Tsou, T.S., Royall, R.M., 1995. Robust likelihoods. J. Amer. Statist. Assoc. 90, 316–320.Verdinelli, I., Wasserman, L., 1991. Bayesian analysis of outlier problems using the Gibbs sampler. Statist. and Comput. 1, 105–117.


Recommended