+ All documents
Home > Documents > Species non-exchangeability in probabilistic ecotoxicological risk assessment

Species non-exchangeability in probabilistic ecotoxicological risk assessment

Date post: 18-Nov-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
20
© British Crown Copyright 2011 0964–1998/12/175243 J. R. Statist. Soc. A (2012) 175, Part 1, pp. 243–262 Species non-exchangeability in probabilistic ecotoxicological risk assessment Peter S. Craig and Graeme L. Hickey, Durham University, UK Robert Luttik National Institute for Public Health and the Environment, Bilthoven, The Netherlands and Andy Hart Food and Environment Research Agency,York, UK [Received June 2009. Final revision March 2011] Summary. Current ecotoxicological risk assessment for chemical substances is based on the assumption that tolerances of all species in a specified ecological community are a priori exchangeable for each new substance.We demonstrate non-exchangeability by using a large database of tolerances to pesticides for fish species and extend the standard statistical model for species tolerances to allow for the presence of a single species which is considered non- exchangeable with others. We show how to estimate parameters and adjust decision rules that are used in ecotoxicological risk management.Effects of parameter uncertainty are explored and our model is compared with a previously published less tractable alternative. We conclude that the model and decision rules that we propose are pragmatic compromises between conflicting needs for more realistic modelling and for straightforwardly applicable decision rules. Keywords: Assessment factors; Ecotoxicology; Exchangeability; Risk assessment; Species sensitivity 1. Introduction Much of modern statistics is concerned with models of increasing complexity, with goals of achieving greater realism and with addressing more complex inferences. However, some areas of risk management and decision making, such as ecotoxicological risk assessment (ERA), are resistant to such complexity and are unwilling to use rules which do not take simple intuitive forms. We examine ERA and show how a weakness in standard modelling can be addressed pragmatically, leading to adjustments to standard decision rules which should be comprehensi- ble and usable by risk managers. Such procedures are more likely to be acceptable and therefore to be adopted. ERA is an important tool for restricting the potential ecological damage from chemical substances, such as general chemicals or pesticides, while still permitting industry and agriculture to use them to their advantage. This has gained wider attention since the phased introduction of the new ‘Registration, evaluation, authorisation and restriction of chemicals’ regulation (Euro- pean Commission, 2006) in 2007. It is required that manufacturers and importers gather in- formation on the properties of all their substances, which will allow their safe usage. One such Address for correspondence: Peter S. Craig, Department of Mathematical Sciences, University of Durham, South Road, Durham, DH1 3LE, UK. E-mail: [email protected]
Transcript

© British Crown Copyright 2011 0964–1998/12/175243

J. R. Statist. Soc. A (2012)175, Part 1, pp. 243–262

Species non-exchangeability in probabilisticecotoxicological risk assessment

Peter S. Craig and Graeme L. Hickey,

Durham University, UK

Robert Luttik

National Institute for Public Health and the Environment, Bilthoven, The Netherlands

and Andy Hart

Food and Environment Research Agency, York, UK

[Received June 2009. Final revision March 2011]

Summary. Current ecotoxicological risk assessment for chemical substances is based on theassumption that tolerances of all species in a specified ecological community are a prioriexchangeable for each new substance. We demonstrate non-exchangeability by using a largedatabase of tolerances to pesticides for fish species and extend the standard statistical modelfor species tolerances to allow for the presence of a single species which is considered non-exchangeable with others. We show how to estimate parameters and adjust decision rules thatare used in ecotoxicological risk management.Effects of parameter uncertainty are explored andour model is compared with a previously published less tractable alternative. We conclude thatthe model and decision rules that we propose are pragmatic compromises between conflictingneeds for more realistic modelling and for straightforwardly applicable decision rules.

Keywords: Assessment factors; Ecotoxicology; Exchangeability; Risk assessment; Speciessensitivity

1. Introduction

Much of modern statistics is concerned with models of increasing complexity, with goals ofachieving greater realism and with addressing more complex inferences. However, some areasof risk management and decision making, such as ecotoxicological risk assessment (ERA), areresistant to such complexity and are unwilling to use rules which do not take simple intuitiveforms. We examine ERA and show how a weakness in standard modelling can be addressedpragmatically, leading to adjustments to standard decision rules which should be comprehensi-ble and usable by risk managers. Such procedures are more likely to be acceptable and thereforeto be adopted.

ERA is an important tool for restricting the potential ecological damage from chemicalsubstances, such as general chemicals or pesticides, while still permitting industry and agricultureto use them to their advantage. This has gained wider attention since the phased introduction ofthe new ‘Registration, evaluation, authorisation and restriction of chemicals’ regulation (Euro-pean Commission, 2006) in 2007. It is required that manufacturers and importers gather in-formation on the properties of all their substances, which will allow their safe usage. One such

Address for correspondence: Peter S. Craig, Department of Mathematical Sciences, University of Durham,South Road, Durham, DH1 3LE, UK.E-mail: [email protected]

244 P. S. Craig, G. L. Hickey, R. Luttik and A. Hart

safety issue is the effect of environmental exposure to the substance, controlled or otherwise,on ecological (multispecies) communities, e.g. freshwater species. We defer a detailed discussionof ERA, and the underlying statistical model that is accepted by regulators, to Section 2 andproceed here with a simplified description of the statistical problem.

A simple view of the statistical aspects of ERA is that each substance defines a populationof tolerances, expressed as concentrations or doses, where the tolerance is an attribute of aspecies rather than of individuals. We wish to determine a concentration or dose, which isknown here as the environmental level of concern (ELC) for a substance, below which adverseeffects are unlikely to occur to the ecological community being considered. However, practical-ities and ethics mean that tolerances are measured for only a small number of species. Severalapproaches have been proposed for determining the ELC. The simplest is to divide the lowestmeasured tolerance by an assessment factor—an arbitrarily defined large fixed number whichconservatively accounts for variability and uncertainty. This is motivated by the ‘precautionaryprinciple’ which, in the context of ERA, Forbes and Calow (2002a) defined as

‘applying controls to chemicals in advance of scientific understanding if there is a presumption thatharm will be caused’.

A more refined approach, which we follow, is to adopt a simple statistical model for the mea-sured tolerances which are treated as a random sample from a population of species tolerancesand to use the model to help to determine the ELC.

In practice, the species that are measured are not chosen randomly but the same procedureis followed, based effectively on the more realistic assumption,which is familiar to the Bayesiancommunity, that all species tolerances for the new substance are a priori exchangeable. However,there is a body of informal evidence that the assumption of exchangeability is invalid, particu-larly in relation to pesticide exposure for one fish species, Oncorhynchus mykiss (rainbow trout).We explore a sequence of issues that are necessary to gaining a good view on how practicallyto allow for non-exchangeability in ERA: testing for non-exchangeability, tractable extensionof standard modelling, estimation of hyperparameters representing non-exchangeability andvariance heterogeneity, risk measures and rules for determining the ELC, defensibility of a keyassumption and alternative models for non-exchangeability.

The crux of the issue is that simplicity may be better than complexity, even when simplicityresults in some relative weaknesses. The take-up of more complex statistical methodology inecotoxicology is slow. Moreover, the regulatory process is controlled indirectly by legislation anddirectly by the risk managers who are not research scientists but who are required to be able todefend the risk management process when it is scrutinized by commercial or consumer interests.Procedures which involve relatively small adaptations of familiar techniques are seen to be moretransparent and to be more defensible. Thus our focus is on the detection of non-exchangeabilityand on tractable ways to adapt current ERA methodology to allow for non-exchangeability ina pragmatic and parsimonious manner.

2. Ecotoxicological risk assessment

The decision-making process in ERA is based on so-called risk characterization (EuropeanChemicals Agency, 2008a) which involves

(a) estimation of the predicted exposure concentration which might be found in an ecosystem,i.e. the wider interaction of the different ecological communities (assemblages of multi-species populations) and physical components (e.g. air and water) of an environment, e.g.a ditch, and

Probabilistic Ecotoxicological Risk Assessment 245

(b) assessment of the degree to which the predicated exposure concentration may have adverseconsequences on the communities.

Under current European Union regulatory technical guidance, this fundamental approachto conducting ERAs for general chemicals (European Chemicals Agency, 2008a) and pesticides(European Commission, 2002), which we denote generically as substances from here onwards, isbased on a tiered process. At the lowest tier, the assessment is intended to be simple and econom-ical, yet at the same time robustly conservative. A high tier risk assessment, which is typicallymuch more expensive, is triggered by the failure of lower tiers and generally calls for a detailedjoint probabilistic assessment of (a) and (b) specific to each exposure scenario and ecologicalassemblage; the resulting ERA dossier is subsequently assessed carefully by expert scientists.Since it is not logistically practical to assess the risk to every species within every ecosystem, lowerand intermediate quantitative tiers focus on the consequences for individual species on the basisof a small number of tolerance measurements; the calculations act as a proxy for all ecosystems.

We focus on the intermediate tier of risk assessment. Here, the fundamental decision makingcriterion is if the ELC is greater than the predicted exposure concentration the risk is deemedacceptable; otherwise permission for use is prohibited pending a higher tier assessment. We shalllimit our discussion to aquatic ERA to simplify the language, but the methods discussed areapplicable in a wider context, e.g. to bird-only risk assessment. In this section we provide detailson two features of this problem, assessment factors and species sensitivity distributions, andelaborate further on the motivation for this research, non-exchangeability.

2.1. Assessment factorsExposure is expressed as a concentration of the substance in water, and toxicity of the substanceto a specific species (or genus) type is described in terms of a ‘tolerance’ concentration whichyields a specific effect. A common choice is the median effect concentration EC50. This is theconcentration which is statistically estimated to affect 50% of individuals for a single-speciespopulation in some fixed time period (often 24–96 h) with respect to some chosen relevant mea-surable ecological end point, such as mortality. Species tolerance values for a specific substance,which are collectively referred to as toxicity data, will usually be estimated, and subsequentlytreated as known, only for a very small number of distinct species.

The standard first-tier deterministic procedure determines the ELC by dividing the lowestmeasured tolerance by an ‘assessment factor’. This is a positive fixed number (usually a powerof 10 such as 1000) defined in the appropriate regulatory technical guidance document andwhich is intended to allow for

(a) variation between and within species,(b) differences between acute and chronic sensitivity and(c) extrapolation from laboratory (i.e. single-species tolerance) to field (i.e. ecosystems) im-

pact.

However, little or no justification is provided for its magnitude, leading to ambiguity about theactual level of intended protection (Forbes and Calow, 2002a).

2.2. Species sensitivity distributionsConsiderable attention has been given to probabilistic techniques to derive ELCs. The funda-mental underlying concept is the ‘species sensitivity distribution’ (SSD) (Posthuma et al., 2002),which, for a specific substance, is a distribution modelling the interspecies variability of toler-ance in an ecological community, thus providing a way, separate from any use of assessment

246 P. S. Craig, G. L. Hickey, R. Luttik and A. Hart

factors for other purposes, to relate formally the tolerances of tested species to those of otheruntested species. There is no consensus on how to define the ecological community; Aldenberget al. (2002) call this ‘the Achilles heel of the SSDeology’. A weakness of the concept is thefailure of measured species to represent communities (Forbes Calow, 2002b), yet more refinedapproaches are stifled by limitations on data. Specific models which do address this, e.g. byweightings (Grist et al., 2006), are too complex for regular application in the intermediate tierof risk assessment. Consequently, tolerance measurements for standard test species often act asproxies for many communities. It is the role of higher tier ERA to assess risk to (exposure site-)specific communities.

Standard parametric models for the SSD, motivated by pragmatism, are the log-normal distri-bution (Wagner and Løkke, 1991) and the log-logistic distribution (Aldenberg and Slob, 1993).Considerable attention (Hickey et al. (2008, 2009) and references therein) has been given to theproblem of quantitative assessment of uncertainty concerning the pth percentile of the SSD(which is denoted HCp). This is interpreted as the concentration which is hazardous to p% ofspecies in an ecological community (Alexander and Fairbridge (1999), page 235), and for allintents and purposes defines the ELC subject to an additional SSD-specific assessment factor.A widely accepted protection goal is p=5 (European Chemicals Agency, 2008b). In Fig. 1 weshow an SSD estimated from tolerances for fish species exposed to the herbicide trifuralin.

The distributional assumptions and standard approaches to quantifying risk lead to rules fordetermining the ELC which typically all have the same form: the geometric mean of the toxicitydata divided by a ‘variable assessment factor’ which is determined by the standard deviation ofthe SSD and the level of uncertainty. Determining this variable assessment factor has been thefocus of recent research (Aldenberg et al., 2002; Hickey et al., 2009).

2.3. Non-exchangeabilityThe concept of SSDs involves many assumptions, some of which are untestable (Forbes andCalow, 2002b). However, with a few exceptions such as Duboudin et al. (2004), one notable

0.0

0.2

0.4

0.6

0.8

1.0

Concentration (mg/L)

Pro

port

ion

of s

peci

es

10−2 10−1 100

Barbus sharpeyi

Carassius auratus

Cyprinus carpio

Ictalurus punctatus

Micropterus salmoides

Oncorhynchus mykiss

Pimephales promelas

Fig. 1. Estimated SSDs for fish exposed to the herbicide trifuralin (each point represents an EC50-value forthe species labelled): , estimate of HC5

Probabilistic Ecotoxicological Risk Assessment 247

implicit assumption in the modelling literature is that, before observing the toxicity data for asubstance, the tolerances of all species in the ecological community are exchangeable. A directimplication of this is that information about relative rankings of species’ tolerances in SSDs forother substances is uninformative about their relative rankings for the substance being assessed.An important statistical consequence of this is that any measurements to be made for the sub-stance may be considered to be a random sample from its uncertain SSD regardless of whichspecies are to be measured.

The informal body of evidence (e.g. Dwyer et al. (2005)) which suggests that Oncorhynchusmykiss, and possibly other species, are non-exchangeable with respect to other fish species issupported by a recent report of the European Food Safety Authority (2005). Despite this,Oncorhynchus mykiss is a standard test species (Rand (1995), page 78).

The issue of (non-)exchangeability has largely been ignored in ERAs. Raimondo et al. (2008)issued caution about conducting ERAs on the basis of the use of certain groups of speciesas proxies for all fish due to an apparent demonstration of higher tolerance. Stephan (2002)reported that one might purposefully populate estimated SSDs with recognizably less tolerantspecies to ensure conservatism, acknowledging that this ad hoc method violates SSD assump-tions. Alternative methods such as bootstrapping described by Newman et al. (2000) mayaccount for these effects, although it is not explicitly clear how. Grist et al. (2006) proposed theconstruction of community level SSDs as mixtures of distributions for taxonomic subgroups,thereby acknowledging different tolerances of specific species groups.

A natural response of a statistical modeller (including some reviewers of this paper) wouldbe to abandon exchangeability and to use a crossed random-effects model (Goldstein (1995),chapter 8) incorporating both species and substance effects, although some adaptation of thestandard model would be required to allow for observed heterogeneity in tolerance variabilitybetween substances. Although that might succeed from a modelling perspective, it would sub-stantially complicate the risk assessment procedure for several reasons. First, the incompletefactorial nature of any available database of measured tolerances would lead to highly con-founded estimates of individual species and substance effects. Consequently, uncertainty thatis attached to those estimates would be substantial and strongly correlated and would requirecareful propagation into decision rules. Secondly, it would not be possible to summarize therelevant information in an entire toxicity database through a small number of estimated par-ameters. The database would have to be made available to all participants in ERA and access toproprietary data would be an issue. Finally, the whole concept of the SSD and its use in ERAwould require substantial reconsideration by ecotoxicologists. For example, unlike the cur-rent situation, making inferences about a percentile would require knowledge of the currentlyunspecified number of species in the ecological community. Overall, persuading risk managersto accept any resulting procedures would be extremely difficult.

3. Testing the assumption of exchangeability

The European Food Safety Authority (2005) provided an informal demonstration that On-corhynchus mykiss may be non-exchangeable, showing graphically that its tolerance tended tobe less than the geometric mean tolerance of other species measured on the same pesticide. Weprovide a more formal approach.

We investigate the null hypothesis that species tolerances are a priori exchangeable for eachnew substance, particularly pesticides. We propose two non-parametric tests, based on the ranksof an available toxicity database that is described below, motivated by the familiar sign and ranksum tests for differences between two populations; the latter is more powerful but less robust as

248 P. S. Craig, G. L. Hickey, R. Luttik and A. Hart

it is more sensitive to outcomes for individual substances. We chose a non-parametric approachto testing, despite the fact that the modelling approach in later sections is parametric, so thatwe could be sure that any test we used was actually providing evidence of non-exchangeabilityrather than evidence against parametric assumptions.

3.1. DataThe data that we use were kindly supplied by the Dutch National Institute for Public Health andthe Environment (DNIPHE) and comprise 1903 EC50 tolerance measurements for 172 distinctfish species and 379 different substances, in this case pesticides. The data, which were previouslyused by the European Food Safety Authority (2005), are a subset of a research database thatwas developed by De Zwart (2002) which has been amalgamated from many sources.

Henceforth, yij is the logarithm (base 10) of the tolerance of species j for substance i and theterm SSD refers to the distribution of yij for fixed i. The number of species tested on substancei in the database is denoted ni, and mj is the number of substances on which species j has beentested. We also denote rij to be the rank of the measurement for species j among those tested onsubstance i, ties being assigned the average of the corresponding ranks. We use log-transformedtolerance for several reasons:

(a) variability is stabilized (leading to additive errors);(b) resulting distributions are often quite close to normal;(c) it is conventional in many areas of toxicology.

The data are by no means a complete factorial design; EC50 has been measured for only 1903of the possible 65188 substance–species pairs. There are 143 substances for which ni =2, another135 with ni � 5, 64 with 6 �ni � 10, 30 with 11 �ni � 20 and seven with ni ranging from 21 to47. From the species viewpoint, there are 74 for which mj =1, 22 with mj =2, another 26 withmj � 5, 19 with 6 � mj � 10, 13 with 11 � mj � 20, 11 with 21 � mj � 50 and seven individualspecies where mj is 54, 59, 76, 153, 160, 166 and 344. The last of these is Oncorhynchus mykisswhich is the focus of much of this paper.

3.2. Sign testUnder the null hypothesis of exchangeability, the tolerance of a species should be equally likelyto appear above or below the median of the data for each substance. For each species, we canapply the binomial distribution to determine whether it occurs too often on one or other side.We ignore those substances where tolerance of the species equals the median; although this mayreduce power, it leads to a simple exact conditional test.

For a species, calculate m+ and m−, which are the numbers of substances for which thespecies tolerance respectively exceeds or is exceeded by the median of measured tolerances forthe substance. Under the null hypothesis, conditional on the number of trials m+ +m−, m+ hasa binomial distribution with success probability 1

2 . We compute the two-tailed probability ofobtaining a value as extreme as the observed m+.

Results from applying this test to the DNIPHE database are displayed for the 10 species withthe smallest P-values in Table 1. One should be careful when interpreting the table. There isstrong evidence against exchangeability but it does not guarantee that Oncorhynchus mykiss isthe only such species presenting such a feature nor that it is the most, for want of a better word,biased species although it does identify it as a candidate. Clearly, there is more power to detectnon-exchangeability when m is large but there are also species in Table 1 which have not beentested very often. Note that, even if we apply the highly conservative Bonferroni correction toadjust the minimum P-value for multiple testing, the result is 172×3:9×10−15 =6:7×10−13.

Probabilistic Ecotoxicological Risk Assessment 249

Table 1. Species with the smallest P -values for the sign test†

Species m m+ +m− m+ m+=(m+ +m−) P-value

Oncorhynchus mykiss 344 301 83 0.28 3:9×10−15

Carassius auratus 76 69 56 0.81 1:7×10−7

Cyprinus carprio 166 150 103 0.69 5:6×10−6

Heteropneustes fossilis 36 36 31 0.86 1:3×10−5

Oncorhynchus clarki 42 41 10 0.24 1:5×10−3

Pimephales promelas 160 147 93 0.63 1:6×10−3

Carassius carassius 25 23 19 0.83 2:6×10−3

Channa punctatus 17 16 14 0.88 4:2×10−3

Clarias batrachus 17 16 14 0.88 4:2×10−3

Salvelinus namaycush 35 33 8 0.24 4:6×10−3

†m is the number of substances tested for the species; m+ and m− are the numbers ofsubstances where the tolerance for the species respectively exceeds or is exceeded by themedian.

Table 2. Species with the smallest P -values for the rank sum test

Species m P-value Effect size

Oncorhynchus mykiss 344 8:6×10−12 −0.42Heteropneustes fossilis 36 1:9×10−7 0.83Carassius auratus 76 3:1×10−5 0.68Salvelinus fontinalis 33 1:3×10−4 −0.58Carassius carassius 25 1:6×10−4 0.85Oncorhynchus clarki 42 3:6×10−4 −0.61Clarias batrachus 17 4:0×10−4 0.91Salvelinus namaycush 35 2:4×10−3 −0.59Channa striata 10 3:9×10−3 0.73Perca flavescens 29 6:5×10−3 −0.38

3.3. Rank sum testAs in the standard situation of comparing two populations, the rank sum test that is proposedhere should be more powerful than the sign test. For species j, define the test statistic to be thesum of rij over those substances for which the species has been tested. In effect, this gives moreweight to substances for which more species have been tested. Conditional on ni, under the nullhypothesis, each rij is uniformly distributed on the integers from 1 to ni, provided that there areno ties, and is independent for different values of i.

The exact null sampling distribution of the test statistic is computationally intractable butis easily approximated, either by Monte Carlo methods or a central limit theorem basednormal approximation using the theoretical mean and variance, which are easily obtained underthe null hypothesis in the absence of ties. The difficulty with the former is that many of ourP-values are very small and would require very many Monte Carlo repetitions. However, thisis likely to happen only when mj is large when we would expect the normal approximationto be more effective. As our activity is largely exploratory, we simply show P-values from thenormal approximation in Table 2 for the DNIPHE database. Monte Carlo simulation with 10000repetitions did not give significantly different P-values; therefore, we did not attempt to adjustthe normal approximation for ties. Also shown is an effect size for each species obtained bystandardizing each rij by using the mean and standard deviation of the null discrete uniform

250 P. S. Craig, G. L. Hickey, R. Luttik and A. Hart

distribution and computing the average value for each species. It provides some informationabout the average position of a species across a population of substances.

Interpretation of Table 2 is subject to the same caveat as for Table 1. It should be seen as pro-viding further evidence of the apparent non-exchangeability of Oncorhynchus mykiss tolerances.Many of the same species appear and for those species the effect sizes in Table 2 are consistentwith the relative sizes of m+ and m− in Table 1. The appearance of other species indicates thatthe two tests emphasize different aspects of departures from exchangeability.

3.4. Focusing on Oncorhynchus mykissIt is quite plausible that the exchangeability assumption is untenable from the perspective ofstatistical modelling and that all species are in fact non-exchangeable; if we eliminate all theOncorhynchus mykiss data from the database we still find clear evidence of non-exchangeabilityfor the remaining species, based on both tests.

Instead we concentrate on the case of a single non-exchangeable species because our goalis tractable and useful decision rules rather than better statistical modelling. We consider thepossibility of allowing for multiple non-exchangeable species in our final discussion. Our choiceof Oncorhynchus mykiss as the single non-exchangeable species is justified by its special role incurrent regulation. It is a standard test species and therefore has greater potential than mostspecies to influence risk assessment outcomes.

Aldenberg et al. (2002) showed that the rate at which the ELC changes as we perturb a singlelog-tolerance is greater for those log-tolerances which are less than the sample mean than forthose which are greater. Therefore, non-exchangeability of Oncorhynchus mykiss deserves moreattention than, for example, non-exchangeability of Carassius auratus (the goldfish), which isshown by Tables 1 and 2 to have a tendency to be less sensitive on average.

4. Modelling

We now suppose that there is a single special species which has non-exchangeable tolerancevalues. We revise our notation so that yi

† denotes the log-tolerance of the special species forsubstance i and yij the log-tolerance for the other species.

Under a priori exchangeability, the standard model is that yij are independently sampled fromN.μi,σ2

i /. We alter this only for the special species for which we specify y†i ∼N{μi −k, .φσi/

2}.Here k and φ are respectively location and scale adjustments and may be interpreted as speci-fying the predictive distribution for y† if μ and σ were to be known for a substance. They applyto multiple substances as only by so doing can we give them identifiable meaning; to be precise,k and φ2 are respectively the averages across substances of μ− y† and .y† + k −μ/2=σ2. Ofcourse, there may be scientific grounds to have groups of non-exchangeability parameters fordifferent classes of chemical, e.g. by the mode of action, but no available data support this atpresent.

Our model for non-exchangeability derives from a different model proposed by the EuropeanFood Safety Authority (2005), for which the expected value of y

†i was μi − k′σi. In that model,

scaling the offset k′ of the mean by the standard deviation means that the expected percentileof the special species in the SSD is unaffected by variability of the standard deviation betweensubstances. The European Food Safety Authority (2005) model may be intuitively more appeal-ing but we are not aware of any argument of principle favouring it. Moreover, unlike that model,our model leads later to tractable decision rules which are a key goal in this work. In Section 8,we assess whether the data favour one model over the other.

Probabilistic Ecotoxicological Risk Assessment 251

Obtaining values (or distributions) for k and φ requires the use at some stage of a databasesuch as that provided by the DNIPHE or of expert judgements. There is not uniform agreementabout the role of such databases in risk assessment. It is clear that their use is acceptable forsome purposes, such as the detection of non-exchangeability and therefore for estimation of kand φ, but some consider other uses to be unacceptable, e.g. construction of prior distributionsfor μ and σ by considering them to be drawn, along with μi and σi, from hyperpopulations ofmeans and standard deviations. The lack of agreement in this area means that we consider twobehavioural models in what follows.

(a) In model M1, μ and σ are unknown and varying between substances; the database isnot used to provide prior information about μ and σ. See, for example, Aldenberg andJaworska (2000).

(b) In model M2,μandσ are unknown and varying between substances;σ is assumed sampledfrom an inverse gamma distribution with hyperparameters α (shape) and β (rate); a data-base for relevant other substances is available to provide information about α and β; thedatabase is not used to provide prior information about μ. See European Food SafetyAuthority (2005).

Models M1 and M2 are not the only proposals in the literature. Aldenberg and Luttik (2002)supposed that μ varies but that σ does not and suggested determining a precise value for σ fromexpert opinion or a suitable database. The European Food Safety Authority (2005) consideredconsequences of uncertainty in estimating σ. However, there seems to be little justification forthe assumption that σ does not vary, even for narrow definitions of chemical classes.

Under model M1, each risk assessment is independent of others (apart from the sharing ofevidence concerning the non-exchangeability parameters). This satisfies those who are wary ofusing evidence from previous assessments to form prior judgements. However, the small amountof data that are available for a typical risk assessment means that there will often be considerablebenefit in exploiting previous experience to stabilize the estimate of σ for the current substanceby incorporating the evidence about variation in values of σ from a database. No hyperpopula-tion of means is proposed in model M2 as we have found the user community to be resistant tothe idea. Moreover, there is less to be gained than for the standard deviations as the DNIPHEdatabase shows that variation in μ is high relative to typical values of σ, so that any proper priorfor μ would typically be diffuse relative to the likelihood.

5. Hyperparameter estimation

There are two groups of hyperparameters: the non-exchangeability parameters k and φ whichappear in both model M1 and model M2 and the heterogeneity parametersα and β which applyonly to M2. In both cases, we use θ as a shorthand for the hyperparameters.

We distinguish two groups of substances for which data may exist although they may not nec-essarily be publicly accessible. G1 is the group of substances, which are deemed to be relevant tothe new substance, for which the tolerance of the special species has been measured. Under modelM2, we also need the collection G2 of substances which are considered relevant for estimating αand β. Note that, under model M2, we must simultaneously estimate the non-exchangeabilityand heterogeneity parameters as they are linked through the likelihood. We shall assume thatG1 is a subset of G2; although possible, it seems unlikely that substances would be consideredrelevant for estimation of non-exchangeability parameters but not for heterogeneity param-eters. This assumption also simplifies the specification of prior distributions. In our example,as in European Food Safety Authority (2005), we take G2 to be the complete collection of

252 P. S. Craig, G. L. Hickey, R. Luttik and A. Hart

substances in the DNIPHE fish database and G1 to be the subset of all those where toleranceswere measured for Oncorhynchus mykiss and at least two other species. This restriction, whichwas applied for direct comparability with a frequentist estimation approach in European FoodSafety Authority (2005), is not strictly necessary but provides more reliable information aboutthe parameters.

In principle, under either behavioural model, one might elicit proper prior distributions forthe hyperparameters from a risk manager but this is unlikely in practice as, aside from lack oftime and expertise, it could constitute a conflict of interest and the risk manager would poten-tially be exposed to pressure from vested interests. In any case, we expect there to be significantamounts of data in both G1 and G2, and so we do not expect inferences to be very sensitive tothe choice of prior distributions for the hyperparameters. Under model M1, we use independentimproper prior distributions π.k,φ/∝1 and π.μi,σ2

i /∝σ−2i for i∈G1. The latter is seen by many

as the practical version of the Jeffreys prior and has been used in other Bayesian SSD litera-ture, e.g. Aldenberg and Jaworska (2000) and European Food Safety Authority (2005), where,as a consequence, frequentist and Bayesian risk calculations coincided. Under model M2, thedistribution of σi is determined by α and β and we again take p.μi/∝1. For the heterogeneityhyperparameters, we take p.α,β/∝1 for α> 0 and β> 0.

With these prior specifications, substances are conditionally independent given the hyper-parameters and so their joint posterior distribution is a sufficient summary of the databasewhen considering a new substance. This sufficiency means that the posterior distributions canbe published and used without requiring open access to the databases from which they arederived (as was the case in European Food Safety Authority (2005)). In principle the posteriordistributions should be updated whenever more data become available, e.g. every time that anew substance is assessed. In practice, however, the same distributions will be used for manyrisk assessments for several reasons:

(a) unavailability of raw data for re-estimation on the fly;(b) infeasibility of sharing all data to ensure that everyone makes the same updates;(c) lack of resources to reappraise values.

Under both model M1 and model M2, the prior distribution and likelihood are now fullydefined but we need to integrate out the nuisance parameters {μi,σ2

i } to obtain the unnor-malized marginal posterior density of the hyperparameters. The posterior densities are brieflyderived in Appendix A.1 and may be maximized numerically to obtain maximum a posteriori(MAP) estimates and the corresponding Hessian matrix.

Estimates and approximate posterior standard deviations are shown in Table 3. Values ofk andφ are similar for models M1 and M2, suggesting that information about non-exchangeabil-ity is largely uninfluenced by the introduction of a model for variance heterogeneity. Uncertain-ties attached to the estimates do not seem large; consequences for determination of ELC valuesare considered more formally in Section 7. The positive estimate of the offset hyperparameterk suggests that Oncorhynchus mykiss tends to be a sensitive species having tolerance below themedian of the SSD. Interpretation of φ is more difficult; however, φ< 1 suggests that the SSDpercentile for Oncorhynchus mykiss is less variable than for other species and leads to increasedweight for the corrected tolerance in estimating the mean of the SSD. Overall, the estimates areconsistent with previous informal suggestions that Oncorhynchus mykiss tends to be sensitive.

Oursomewhatarbitrarychoiceofpriordistributionforthehyperparametersledustoinvestigatesensitivity to that choice by trying other prior distributions. For k we tried p.k/∝1=.0:01+k2/,which strongly favours values of k near 0, and p.k/ ∝ 0:01 + k2, which strongly favours largevalues of k. Similarly, for the other components of θ, which are all positive, we tried p.θi/∝ θi

Probabilistic Ecotoxicological Risk Assessment 253

Table 3. MAP estimates for hyperparameters k , φ, α and β, with posteriorstandard deviations in parentheses

Model k φ α β

M1 0.195 (0.019) 0.702 (0.073) — —M2 0.205 (0.030) 0.656 (0.066) 1.52 (0.24) 0.315 (0.076)

and p.θi/∝1=θi. There were four alternative prior distributions for model M1 and 16 for M2.In all cases the MAP estimates differed from those in Table 3 by less than half the posteriorstandard deviation shown.

6. Decision rules

For determining the ELC in the context of species exchangeability, various decision rules, relatedto estimation of HCp for a specified value p of interest, have been proposed in the literature. Weconsider two existing rules and their generalization to non-exchangeability under both modelM1 and model M2. Generally, risk is measured or controlled via the ‘potentially affected frac-tion’ (PAF), which is the proportion of species whose tolerance lies below the ELC, with someintention to keep the PAF near or below p%. The choice of p is seen to be a policy decisionfor the risk manager; the standard requirement is 5. However, the justification for this choicecomes largely from some validation studies carried out afterwards to examine the consequences.A high PAF corresponds to a high risk for the assemblage of species.

6.1. Risk approaches for determinationWe denote the proposed log10.ELC/ for a new substance by δ. In all the cases that we consider,it can be shown (see Appendix A.2) that δ is of the form μ−κpσ. Here, μ and σ are naturalestimates of μ and σ from the data for the new substance whereas κp does not depend on thesedata, although it does always depend on n and p and the risk measure. κp might be described asa standardized assessment shift so that 10κpσ is the variable assessment factor that was referredto in Section 2.2. Risk managers should find the rules appealing and transparent for reasonsdiscussed later.

In all cases, μ is the standard weighted least squares unbiased estimate of μ, obtained bycorrecting the measurement for the special species to remove the bias k and increasing its weightto allow for the reduction in variability that is implied by φ. Under model M1, σ2 is simply thecorresponding weighted least squares unbiased estimate of σ2 whereas under M2 it is a weightedcombination of that estimate and the prior mean for σ2 implied by α and β. Consequently, onthe original concentration scale the value that is determined for the ELC is a geometric meanof the adjusted toxicity data divided by the aforementioned variable assessment factor. Thedifference between models M1 and M2 is that the latter stabilizes the variability estimate σ byborrowing strength from the pool G2 of existing data; a corresponding adjustment is requiredto the value of κp which then depends on α.

Simple rules based on exchangeable versions of model M1 were proposed by Aldenberg andJaworska (2000) and the European Food Safety Agency (2005). The latter also consideredthe European Food Safety Agency (EFSA) rule in the context of exchangeable model M2;we determine the Aldenberg–Jaworska (AJ) version here for completeness (see Appendix A.2

254 P. S. Craig, G. L. Hickey, R. Luttik and A. Hart

for details). In what follows, note that PAF.δ/ = Φ{.δ−μ/=σ}, where Φ.·/ is the cumulativedistribution function of the standard normal distribution and we write PAF.δ/ to emphasizedependence on the decision rule.

The AJ approach is to demand high probability that PAF.δ/ is less than p%. The risk man-ager specifies p, which is often taken to be 5 in practice, and a credibility requirement γ; thedecision rule is to find δ so that γ is the probability that PAF.δ/ is less than p=100. Noting thatPAF.δ/�p=100 if and only if δ� log10.HCp/, δ satisfies

P.δ�μ−Kpσ/=γ .1/

where Kp is the .100 − p/th percentile of the standard normal distribution; the resulting κp

depends on γ. The probability in equation (1) is computed with respect to the posteriordistribution of μ and σ for the new substance. It has been suggested by some that γ= 0:95may be an appropriate choice (e.g. Wagner and Løkke (1991)). However, current EuropeanUnion guidance (e.g. European Chemicals Agency (2008b)) requires results for γ= 0:50 to bepresented along with those for γ=0:25 and γ=0:75.

The EFSA approach is to try to control PAF.δ/ to be near some suitable value p% which therisk manager specifies. Then δ is the value for which the expected PAF is p=100 and so satisfies

E[Φ{.δ−μ/=σ}]=p=100 .2/

where again the expectation is with respect to the posterior distribution of μ and σ. The valueof p will generally need to be smaller, e.g. p = 1, for the EFSA approach to achieve similarprotection to that obtained by the AJ rule with p=5 when γ=0:95.

To obtain the simple form δ= μ− κpσ, we must assume that the hyperparameters θ areknown or specified precisely so that we actually compute the probability in equation (1) andthe expectation in equation (2) by using the posterior distribution of μ and σ conditional on θ.Consequences of uncertainty about θ are addressed in Section 7.

Several features of these rules make them sensible and easy to apply:

(a) each rule is easily computed and tables for κp can be produced for those who lack thenecessary expertise or software (see Aldenberg and Jaworska (2000), Table 1, page 5);

(b) each rule has the same form as in the exchangeable species case;(c) the AJ rule is a Bayes rule under generalized absolute loss (Hickey et al., 2009);(d) the rules hold from the frequentist perspective in the sense that equations (1) and (2)

remain valid if the calculations are with respect to the sampling distribution of the toler-ance data for the substance, and also the sampling distribution of σ in the case of modelM2, instead of the posterior distribution of μ and σ.

6.2. Consequences of non-exchangeabilityApplication of revised decision rules will ultimately yield different consequences, but it is notimmediately apparent to what degree. Fig. 2 compares the values of δ that were obtained foreach revised rule with those calculated under exchangeability for each substance in the G1 data-base; results are shown for p=5 for each substance i in G1 for the AJ (γ=0:50, 0:95) and EFSArules; we plot δ calculated under exchangeability against the difference (to assist interpretation)between the values of δ obtained under non-exchangeability and exchangeability.

The horizontal broken lines indicate where the decision rules are equal; points above theline indicate substances for which the revised ELC is higher than the original, i.e. where it isecologically less conservative. An important observation for regulators is that the new rules,although correcting for a single sensitive species, do not necessarily lead to higher ELCs. In

Probabilistic Ecotoxicological Risk Assessment 255

δex.

δ non

-ex.

−δ e

x. −1.0

−0.5

0.0

0.5

−0.8

−0.6

−0.4

−0.2

0.0

0.2

−10 −5 0 −4 −3 −2 −1 0 1 2 −6 −4 −2 0 2

(a) (b) (c)

(d) (e) (f)

Fig. 2. Consequences of non-exchangeability for pD5 for all substances in G1 (δ is derived under exchange-ability versus the difference between δs derived under non-exchangeability and exchangeability): (a) AJ rule,γD 0:95, model M1; (b) AJ rule, γD 0:50, model M1; (c) EFSA rule, model M1; (d) AJ rule, γD 0:95, modelM2; (e) AJ rule, γD0:50, model M2; (f) EFSA rule, model M2

fact, the δ-values that are based on non-exchangeability are higher than their exchangeablemodel versions for between 60% and 68% of assessed substances (Fig. 2) for the AJ .γ=0:50/

and EFSA rules, and between 52% and 56% for the AJ (γ= 0:95) rule. This is due partly tothe fact that, although the offset hyperparameter k is positive, the variance estimate also changes,leading sometimes to higher and sometimes to lower values of δ. The largest differencesoccur for substances where the non-exchangeable decision rule is lower than the correspond-ing exchangeable version and under model M1 this feature is more pronounced for the AJ(γ= 0:95) rule as the change of model has more effect in the tails of the posterior distributionfor HC5.

There is some double counting of data here since the estimated hyperparameters θ derivefrom the same database as that used to explore the consequences. However, the estimates arebased on many substances and would change relatively little on omitting one. Moreover, theestimates are those which will be used in the decision rules that we propose for risk managersand it is the consequences of the change to those rules which we wish to evaluate.

7. Consequences of ignoring hyperparameter uncertainty

In Section 6, we assumed that hyperparameter uncertainty could safely be ignored, resulting ina simple form for the rules for determining the ELC. Here we seek to show that the rules thatare derived still perform well even if we allow for hyperparameter uncertainty. The simple formarose from solving equations (1) and (2) making the approximation of using the posterior dis-tribution of μ and σ conditional on taking the hyperparameters θ fixed at their MAP estimatesin place of the marginal posterior distribution of μ and σ. Approximate numerical solution ispossible when θ is uncertain but it is not easy to ensure reliability or accuracy.

However, the left-hand sides of equations (1) and (2) can each be seen as measuring per-formance of a chosen value of δ and the right-hand sides as specifying intended performance.For the AJ rule, the performance measure is the probability that the PAF is less than p; for the

256 P. S. Craig, G. L. Hickey, R. Luttik and A. Hart

EFSA rule, it is the expected PAF. Consequences of ignoring hyperparameter uncertainty foreach decision rule may be assessed by taking δ fixed at the value that is used for each substancein producing the corresponding panel in Fig. 2 and accurately computing the left-hand side ofequation (1) for the AJ rule or equation (2) for the EFSA rule to obtain attained performance.The result may be compared with the intended value: γ for the AJ rule or p for the EFSA rule.If an attained value is greater or lower than intended, ignoring hyperparameter uncertainty hasled respectively to higher or lower than intended protection of the ecological community.

Computation of attained performance for each substance is simple once we have a largerandom sample of values from the posterior distribution of θ; one calculates the performance ofδ for each value of θ and then averages. We took a Markov chain Monte Carlo sample of 10000values from the posterior density of the hyperparameters under each behavioural model, usinga Metropolis random-walk sampler with a normal proposal distribution based on the Laplaceapproximation to the posterior, which can be performed by using regular statistical software;see for example Albert (2007), page 110.

Fig. 3 shows attained performance for each substance in G1 for both behavioural models withp = 5. The same three ELC rules are considered as in Fig. 2: AJ (γ= 0:95), AJ (γ= 0:50) andEFSA. In each plot the intended performance level is emphasized by a broken line. In inter-preting differences between intended and attained performance, we must recognize that this isintermediate tier ERA, that the chosen value of p = 5 has no direct ecological meaning andthat the actual PAF will always be highly variable between substances owing to the relativelysmall numbers of species tested. With the exception of one substance, attained performanceunder model M1 does not differ from intended performance in any practical sense; for examplethe difference between 50% credibility and 48% credibility is negligible. Even in the exceptionalcase, the difference may be acceptable to risk managers. Under model M2, there are somewhatlarger typical differences between attained and intended performance but these are still tolera-ble in our opinion. In all cases, it appears that slight underprotection occurs more often thanoverprotection.

Earlier, we examined the sensitivity of hyperparameter estimates to our choice of prior dis-tribution for the hyperparameters as we cannot be sure that our chosen prior is the best repre-sentation of prior knowledge. We also evaluated the attained performance for each substanceof each δ shown in Fig. 2 by using the posterior distribution for μ and σ obtained using each

Model

Perfomance0.

925

0.93

00.

935

0.94

00.

945

0.95

00.

400.

420.

440.

460.

480.

500.

050

0.05

50.

060

0.06

50.

070

M2

M1

(a) (b) (c)

Fig. 3. Boxplots of per-substance attained performance for decision rules obtained by ignoring hyper-parameter uncertainty (attained performance is expected PAF for the EFSA rule and credibility that PAF<p%for the AJ rule, computed allowing for hyperparameter uncertainty): (a) AJ rule, γD0:95; (b) AJ rule, γD0:50;(c) EFSA rule

Probabilistic Ecotoxicological Risk Assessment 257

of the alternative priors described in Section 5. Naturally, there were some differences betweenattained and intended performance. Nevertheless, for the majority of the alternative priors, thedifferences were small, especially under model M1, and even in the worst case the differenceswere less than 20% of intended p for the EFSA rule and of intended 1 − γ for the AJ rule. Ineffect, the rules were still attaining the right magnitude of performance despite the fact thatthe original prior was being used for determining δ and the alternative priors for computingattained performance.

8. Comparison of models for non-exchangeability

In Section 4, we introduced our model for non-exchangeability and noted its tractability com-pared with the model proposed in European Food Safety Authority (2005). We now considerthe evidence in favour of one over the other from other perspectives. We denote by D1 themodel that was introduced by European Food Safety Authority (2005), with non-exchange-ability hyperparameters k′ and φ′ and by D2 our model with parameters k and φ. Details ofmodels D1 and D2 were provided in Section 4. There we did not distinguish φ from φ′; however,although apparently the same, φ′ and φ have different meanings due to the difference betweenmodels D1 and D2 in the treatment of the mean for the special species. Table 4 gives estimatesunder model D1 corresponding to those under D2 given earlier in Table 3. In principle, undermodel M2, estimates of α and β differ for models D1 and D2 owing to the different treatmentof non-exchangeability; however, the tabulated values coincide.

Suppose that we take a substance out of the database G1 and consider it to be the substanceunder current assessment. We compare the two non-nested non-exchangeability models D1 andD2 for each substance by using a Bayes factor (Bernardo and Smith 1994; Kass and Raftery,1995) to measure the evidence in favour of model D1 against D2. The Bayes factor for a sub-stance is the ratio of the marginalized likelihoods under models D1 and D2 where each mar-ginalized likelihood is the expectation, calculated by using the prior distribution of μ and σ, ofthe conventional likelihood for the data for the substance. Evidence provided by a Bayes factorin favour of model D1 or D2 may be interpreted by using a descriptive categorization such asthat proposed by Kass and Raftery (1995), section 3.2, which provides an intuitive and practicalapproach to model comparison for applied Bayesian statistics. There are some technical issueswhen applying Bayes factors with improper prior distributions and we must treat the hyper-parameters as fixed; details are given in Appendix A.3 along with the formula for the Bayes factor.

Fig. 4 shows the Bayes factors for individual substances separately for models M1 and M2.Under M2, all lie in a range that was deemed by Kass and Raftery (1995) not to indicate asignificant advantage for either model. The same is true for most substances under M1 althoughthere are a few in each direction strongly favouring model D1 or D2. However, 131 of 220 Bayesfactors are positive under M1 and 141 under M2, which may suggest some overall preferencefor model D1.

Table 4. MAP estimates under model D1 for hyperparameters k0, φ0,α0 and β0, with posterior standard deviations in parentheses

Model k′ φ′ α′ β′

M1 0.458 (0.060) 0.642 (0.076) — —M2 0.452 (0.056) 0.604 (0.065) 1.52 (0.22) 0.315 (0.069)

258 P. S. Craig, G. L. Hickey, R. Luttik and A. Hart

100500 150 200

−1.

5−

1.0

−0.

50.

00.

51.

0

Substance

log 10

Bay

es fa

ctor

log 10

Bay

es fa

ctor

500 100 150 200

−0.

2−

0.1

0.0

0.1

0.2

Substance(a) (b)

Fig. 4. Bayes factors for model D1 versus model D2 for substances in G1: (a) model M1; (b) model M2

A simple summary of the overall evidence for D1 against D2 is the overall Bayes factor, whichis obtained as the product of the per-substance Bayes factors since substances are condition-ally independent when θ is fixed. Under model M1, this is 2.6, which Kass and Raftery (1995)described as ‘not worth a bare mention’, whereas under M2 it is 426, which they considered‘decisive’ in favour of D1. However, it is unclear how much ignoring hyperparameter uncer-tainty undermines the calculation, especially given that the estimates are based on the samedata. Unfortunately, there is little expert knowledge on which to base proper prior distributionsand none which would prevent the Bayes factor from depending arbitrarily on the relative priordensity of k and k′. We are left with the facts that

(a) model D2 leads to tractable risk calculations,(b) individual substances do not distinguish model D1 from D2 and(c) the overall picture slightly favours model D1 over D2 but only if the same form of non-

exchangeability is assumed to hold throughout. Model D2 is our pragmatic choice.

9. Discussion

We have provided evidence to support a previous informal view that an important test species,Oncorhynchus mykiss (the rainbow trout), fails to satisfy the key exchangeability assumptionin the SSD approach to ecotoxicology. We then showed how to adapt current modelling andprocedures to allow for a single species with non-exchangeable tolerance, while retaining twokey features: simplicity of decision rules and no need to share databases. However, the evidenceclearly suggests that more than one species may be non-exchangeable.

In Section 2.3, we explained the difficulties in using the apparently natural approach of acrossed random-effects model. In short, it would not lead to simple decision rules, it wouldrequire more sharing of data and would require careful reconsideration of the SSD concept,thereby violating our goal to seek procedures which would be sufficiently transparent to allowadoption by risk managers. We do not know whether it would lead to better decision ruleperformance. Our solution has the merits that it addresses the problem of non-exchangeabilityfor the standard test species, that it is a relatively straightforward adaptation of current meth-odology and that it seems to be reasonably well supported by data. Crucially, it is sufficientlysimple that risk managers need not radically alter their approach.

Probabilistic Ecotoxicological Risk Assessment 259

Mathematically, and to some extent computationally, it is straightforward to extend the modeland decision rules in this paper to allow for multiple special species. However, this introducestwo fundamental problems. The first is to decide which and how many species should be treatedas having non-exchangeable tolerances. It is likely that disagreement on this issue would makeit difficult to establish standard decision rules. The second, and more serious, conceptual prob-lem is that the SSD is supposed to be a surrogate for ecosystems. In our current proposal, theSSD does not describe the special species and protection is still achieved purely in terms ofthe SSD although the special species contributes information. In removing more species fromthe SSD, we would eventually have to consider how to use the SSD together with the specialspecies’s tolerances to achieve protection goals.

An alternative would be to model SSDs as mixtures (Grist et al., 2006; Hickey et al., 2008)where species in the ecological community are grouped taxonomically. Although it would notaccount fully for species non-exchangeability, it might be appropriate where sensitive groupsare known to be measured. It has appeal for complex and diverse communities, but it wouldneed additional knowledge of taxonomic weightings, more data and specialist statistical soft-ware for working with mixture distributions. Consequently, such models are unlikely to becomecommonplace tools for intermediate tier ERA.

Current ERA procedures generally use only the data for the substance under consideration.Decision rules based on hyperparameters estimated from multisubstance databases may notimmediately appeal to the user community but at least do not require general sharing of data-bases. However, a conventional Bayesian approach would involve updating hyperparameters asmore data become available. That would require someone to augment databases and to recom-pute hyperparameter estimates on an on-going basis. In our proposal, the hyperparameterswould be static and used over a significant period of time for many risk assessments. This is notintended to improve on the standard paradigm but is simply pragmatic. It removes the require-ment for those actively involved in ERA to use sophisticated statistical software and allowsusers instead to use spreadsheet software and publishable look-up tables, since more complexanalysis would only be performed occasionally by statisticians. There remains the issue of howand when databases would be updated but that is a problem for the ERA community and notfor statisticians.

Acknowledgements

Hickey thanks the Engineering and Physical Sciences Research Council and the Food and Envi-ronment Research Agency for funding his research during his doctoral research. We thank TomAldenberg, Rijksinstituut voor Volksgezondheid en Milieu, for his comments on a draft ofthis paper. We are also grateful to the Joint Editor, Associate Editor and two referees for theirdetailed comments which led to considerable improvements.

Appendix A

A.1. Parameter estimationHere we give details of the posterior distributions for the hyperparameters under models M1 and M2.In the interest of clarity, we extend the notation of Section 4 by writing τi = 1=σ2

i and we note thatthe transformed prior density is p.τi/ ∝ 1=τi for τi > 0. We also denote the database of toxicity dataas Y. The collection of ni − 1 species tested with substance i, but not including the special species, isdenoted JÅ

i .Under model D2, for both model M1 and model M2, define

260 P. S. Craig, G. L. Hickey, R. Luttik and A. Hart

μi =φ−2.y

†i +k/+ ∑

j∈JÅ

yij

φ−2 +ni −1.3/

and

σ2i = 2β+ .ni −1/σ2

i

2α+ .ni −1/, .4/

where

σ2i = 1

ni −1

{φ−2.y

†i +k − μi/

2 + ∑j∈JÅ

.yij − μi/2}

, .5/

where, for model M1, α=β= 0. Note the implicit dependence on hyperparameters and also that μi andσ2

i are the usual weighted least squares unbiased estimators of μi and σ2i . For model M2, σ2

i is also unbi-ased from the frequentist viewpoint if we incorporate drawing σ2

i from an inverse gamma population ofvariances into the sampling scheme.

Under models D2 and M1, writing μG1 and τG1 as shorthand for the vectors of the μi and τi for i∈G1respectively, and vt for the number of substances in Gt .t = 1, 2/, we easily obtain the likelihood functionfor all the unknown parameters:

L.k,φ,μG1 , τG1 /∝ ∏i∈G1

φ−1τni=2i exp

[− 1

2 τi

{φ−2.y

†i −μi +k/2 + ∑

j∈JÅi

.yij −μi/2}]

=φ−v1∏

i∈G1

τni=2i exp[− 1

2 τi{.φ−2 +ni −1/.μi −μi/2 + .ni −1/σ2

i }]:

Multiplying by the joint prior density that is defined in Section 5 for k, φ, μi and τi (i ∈G1) yields theunnormalized posterior distribution and, after integration with respect to each μi and τi, we obtain theposterior density for k and φ:

p.k,φ |Y/∝φ−v1∏

i∈G1

Γ.αi/

βαi

i

1√.φ−2 +ni −1/

, .6/

where αi = 12 .ni −1/ and βi = αiσ

2i . Maximizing this function with respect to its arguments subject to the

constraint α=β=0 determines the joint MAP estimator for k and φ.Under models D2 and M2, we use the additional v2 −v1 substances in G2\G1 and estimate α, β, k and

φ. Momentarily continuing to treat the τi as parameters, the likelihood is now

L.k,φ,μG1 , τG1 /∏

i∈G2\G1

τni=2i exp[− 1

2 τi{ni.yi −μi/2 + .ni −1/s2

i }]

where μG2\G1 and τG2\G1 are similarly defined as earlier, and yi and si are the sample mean and standarddeviation of yij , ∀j ∈Ji. Now, we must multiply by the sampling density,

p.τi |α,β/= βα

Γ.α/τα−1

i exp.−βτi/

for i ∈G2, recalling G1 ⊆G2 and integrate with respect to each τi > 0 to obtain the true likelihood undermodel M2. However, we then intend to multiply by the prior density p.k,φ,α,β,μG2 / ∝ 1 and integratewith respect to each μi to obtain the marginal posterior and it is easier to reverse the order of integration(as earlier) to obtain

p.α,β, k,φ |Y/∝{

βα

Γ.α/

}v2

φ−v1

{ ∏i∈G2

Γ.αi/

βαi

i

} ∏i∈G1

1√.φ−2 +ni −1/

, .7/

where αi =α+ αi and βi =β+ βi for i∈G1.⊇G2/.Under model D1, μi and σ2

i in equations (3) and (4) are now functions of τi as k must be replaced byk′=

√τi and we also replace α by α′, β by β′ and φ by φ′. Consequently, when calculating the equivalent

of expressions (6) and (7), the integrals with respect to μi can still be done in closed form but integrationwith respect to τi must be approximated numerically.

Probabilistic Ecotoxicological Risk Assessment 261

A.2. Decision rules under model D2For model M2, it is a straightforward generalization of standard Bayesian calculations for normal samplingto obtain the posterior distribution of μ and σ2—the parameters of an SSD for a new substance—con-ditional on known θ and tolerance measurements for a substance: 1=σ2 has a gamma distribution withshape α=α+ 1

2 .n−1/ and mean 1=σ2 and, given σ, μ has a normal distribution with mean μ and varianceσ2=.φ−2 +n−1/, given by equations (3) and (4) respectively after dropping the subscript i. Under modelM1, σ2 simplifies to σ2.

Decision rules are determined to be of the form μ−κpσ for both the AJ and the EFSA methods. Thisfollows from two standard results for the normal–inverse gamma posterior distribution for μ and σ2:

(a) μ−Kpσ has a rescaled non-central t-distribution;(b) the predictive distribution of a further observation is a relocated and rescaled t-distribution.

For the AJ method, the decision rule follows directly from (a), whereas, for the EFSA method, we need tonote that E{PAF.δ/} is the probability that the tolerance of a random species lies below δ, which is givenby result (b).

For the AJ rule, ψκp is the γth percentile of the non-central t-distribution with η= 2α+n− 1 degreesof freedom and non-centrality parameter ψKp, where ψ2 =φ−2 +n−1 is the total weight of the observa-tions. For the EFSA rule, κp=

√.1+ψ−2/ is the .100−p/th percentile of the (central) t-distribution with η

degrees of freedom. Note that κp-values differ for methods M1 and M2 and are non-comparable as theyare to be applied to different estimates of σ. For model M1, take α=β=0. Similarly, calculations underexchangeability may be recovered by taking k =0 and φ=1.

A.3. Bayes factorsFor Bayes factors for model D1 against model D2 for a new substance, first consider model M2. Let .k′,φ′/and .k,φ/ denote the estimated values of the non-exchangeability hyperparameters under models D1 andD2 respectively and let .α′,β′/ and .α,β/ be the respective variance heterogeneity parameters. We take thehyperparameters to be fixed in each mode because Bayes factors are generally undefined when improperpriors are used and also because, as in Section 6.2, the models that we propose for actual use have fixedhyperparameters. Next, recall that our prior distribution for μ is the improper uniform distribution on thereal line so that we may exploit expression (7) to obtain the terms for a single substance under model D2.With the form of the likelihood function given in Appendix A.1, we obtain the terms for a single substanceunder model D1, on which we can see that the Bayes factor in favour of model D1 over D2 is

β′ α′

βαΓ.α/

Γ.α′/φ√

.φ−2 +n−1/

φ′√.φ′−2 +n−1/

βα

Γ.α/

∫ ∞

0τ α

′−1 exp[− 12 τ{2β′ + .n−1/σ2.τ /}] dτ , .8/

where α and β are defined as underneath expression (7) in Appendix A.1 (omitting the subscript i),α′ =α′ + α, and σ2.τ / is given by equations (4) and (5) (omitting the subscript i) with k replaced byk′=

√τ , α by α′, β by β′ and φ by φ′. The integral may be evaluated straightforwardly by numerical quad-

rature to high accuracy. The Bayes factor for model M1 is given by expression (8), omitting the termβ′α′

Γ.α/=βα Γ.α′/ and taking α′ =α=0 and β′ =β=0 in the remainder.The prior distributions on μ and σ for model M1 and μ for model M2 are improper. However, following

Bernardo and Smith (1994), page 422, we argue that the Bayes factors are well defined as these parame-ters are identically operationally defined under models D1 and D2 with respect to a hypothetical infinitepopulation of exchangeable species in the SSD. In such contexts the Bayes factor that is obtained may beviewed as a limit of the Bayes factor that is obtained by using the same proper prior in the numerator anddenominator.

References

Albert, J. (2009) Bayesian Computation with R, 1st edn. New York: Springer.Aldenberg, T. and Jaworska, J. S. (2000) Uncertainty of the hazardous concentration and fraction affected for

normal species sensitivity distributions. Ecotoxicol. Environ. Safty, 46, 1–18.Aldenberg, T., Jaworska, J. S. and Traas, T. P. (2002) Normal species sensitivity distributions and probabilistic

ecological risk assessment. In Species Sensitivity Distributions in Ecotoxicology (eds L. Posthuma II, G. W. Suterand T. P. Traas), pp. 49–102. Boca Raton: Lewis.

262 P. S. Craig, G. L. Hickey, R. Luttik and A. Hart

Aldenberg, T. and Luttik, R. (2002) Extrapolation factors for tiny toxicity data sets from species sensitivity distri-butions with known standard deviation. In Species Sensitivity Distributions in Ecotoxicology (eds L. PosthumaII, G. W. Suter and T. P. Traas), pp. 103–118. Boca Raton: Lewis.

Aldenberg, T. and Slob, W. (1993) Confidence limits for hazardous concentrations based on logistically distributedNOEC toxicity data. Ecotoxicol. Environ. Safty, 25, 48–63.

Alexander, D. E. and Fairbridge, R. W. (1999) Encyclopaedia of Environmental Science. Amsterdam: Kluwer.Bernardo, J. M. and Smith, A. F. M. (1994) Bayesian Theory. Chichester: Wiley.De Zwart, D. (2002) Observed regularities in SSDs for aquatic species. In Species Sensitivity Distributions in

Ecotoxicology (eds L. Posthuma II, G. W. Suter and T. P. Traas), pp. 133–154. Boca Raton: Lewis.Duboudin, C., Ciffroy, P. and Magaud, H. (2004) Effects of data manipulation and statistical methods on species

sensitivity distributions. Environ. Toxicol. Chem., 23, 489–499.Dwyer, F. J., Mayer, F. L., Sappington, L. C., Buckler, D. R., Bridges, C. M., Greer, I. E., Hardesty, D. K., Henke,

C. E., Ingersoll, C. G., Kunz, J. L., Whites, D. W., Augspurger, T., Mount, D. R., Hattala, K. and Neuderfer,G. N. (2005) Assessing contaminant sensitivity of endangered and threatened aquatic species: I, acute toxicityof five chemicals. Arch. Environ. Contamn Toxicol., 48, 143–154.

European Chemicals Agency (2008a) Uncertainty analysis. In Guidance for the Implementation of REACH: Guid-ance on Information Requirements and Chemical Safety Assessment, ch R.19. Helsinki: European ChemicalsAgency.

European Chemicals Agency (2008b) Characterisation of dose [concentration]-response for environment. In Guid-ance for the Implementation of REACH: Guidance on Information Requirements and Chemical Safety Assessment,ch. R.10. Helsinki: European Chemicals Agency.

European Commission (2002) Guidance document on aquatic ecotoxicology in the context of the Directive91/414/EEC. Document SANCO/3268/2001, revision 4. European Commission, Brussels.

European Commission (2006) Regulation (EC) No. 1907/2006 of the European Parliament and of the Council of18 December 2006. Off. J. Eur. Un., L 396/1.

European Food Safety Authority (2005) Opinion of the Scientific Panel on Plant Health, Plant Protection Prod-ucts and their Residues on a Request from EFSA related to the assessment of the acute and chronic risk toaquatic organisms with regard to the possibility of lowering the uncertainty factor if additional species weretested. EFSA J., 301, 1–45.

Forbes, V. E. and Calow, P. (2002a) Extrapolation in ecological risk assessment: balancing pragmatism andprecaution in chemical controls legislation. BioScience, 52, 249–257.

Forbes, V. E. and Calow, P. (2002b) Species sensitivity distributions revisited: a critical appraisal. Hum. Ecol. RiskAssessmnt, 8, 1625–1640.

Goldstein, H. (1995) Multilevel Statistical Models, 2nd edn. London: Arnold.Grist, E. P. M., O’Hagan, A., Crane, M., Sorokin, N., Sims, I. and Whitehouse, P. (2006) Bayesian and time-inde-

pendent species sensitivity distributions for risk assessment of chemicals. Environ. Sci. Technol., 40, 395–401.Hickey, G. L., Craig, P. S. and Hart, A. (2009) On the application of assessment factors in ecological risk.

Ecotoxicol. Environ. Safty, 72, 293–300.Hickey, G. L., Kefford, B. J., Dunlop, J. E. and Craig, P. S. (2008) Making species salinity sensitivity distribu-

tions reflective of naturally occurring communities: using rapid testing and Bayesian statistics. Environ. Toxicol.Chem., 27, 2403–2411.

Kass, R. E. and Raftery, A. E. (1995) Bayes Factors. J. Am. Statist. Ass., 90, 773–795.Newman, M. C., Ownby, D. R., Mezin, L. C. A., Powell, D. C., Christensen, T. R. L., Lerberg, S. B. and Anderson,

B. A. (2000) Applying species sensitivity distributions in ecological risk assessment: assumptions of distributiontype and sufficient numbers of species. Environ. Toxicol. Chem., 19, 508–515.

Posthuma II, L., Suter, G. W. and Traas, T. P. (eds) (2002) Species Sensitivity Distributions in Ecotoxicology. BocaRaton: Lewis.

Raimondo, S., Vivian, D. N., Delos, C. and Barron, M. G. (2008) Protectiveness of species sensitivity distributionhazard concentrations for acute toxicity used in endangered species risk assessment. Environ. Toxicol. Chem.,27, 2599–2607.

Rand, G. M. (ed.) (1995) Fundamentals of Aquatic Toxicology: Effects, Environmental Fate and Risk Assessment,2nd edn. Philadelphia: Taylor and Francis.

Stephan, C. E. (2002) Use of species sensitivity distributions of water quality criteria for aquatic life by the U.S.Environmental Protection Agency. In Species Sensitivity Distributions in Ecotoxicology (eds L. Posthuma II,G. W. Suter and T. P. Traas), pp. 211–220. Boca Raton: Lewis.

Wagner, C. and Løkke, H. (1991) Estimation of ecotoxicology protection levels from NOEC toxicity data. Wat.Res., 25, 1237–1242.


Recommended