“A Bayesian Generalized Linear Model for the Bornhuetter-Ferguson Method of Claims Reserving,” R. J. Verrall, July 2004/AUTHOR’S REPLY

“A Bayesian Generalized Linear Model for the Bornhuetter-Ferguson Method of Claims Reserving,” R. J. Verrall, July 2004/AUTHOR’S REPLY

Antonio, Katrien

KATRIEN ANTONIO,* JAN BEIRLANT,[dagger] and Tom Hoedemakers*


Professor Verrall nicely illustrates how Bayesian models can be applied to claims reserving within the framework of Generalized Linear Models (GLMs) and how they lead to posterior predictive distributions of quantities of interest. In this discussion we apply a Bayesian model in the context of discounted loss reserves. The outcomes of this approach are compared with results on approximations for the distribution of the discounted loss reserve when the run-off triangle is modelled by a GLM. These approximations are based on the concepts of comonotonicity and convex order and are explained in full details in Hoedemakers et al. (2003) (for lognormal claims reserving models) and Hoedemakers et al. (2005) (for claims reserving using GLMs). The comonotonicity approach has been shown to provide elegant solutions to various other actuarial and financial problems involving the distribution of a sum of dependent random variables (check www. kuleuven.ac.be/insurance-research papers for more illustrations). We realize that the Bayesian posterior predictive distribution is a very general workhorse, which takes into account all sources of uncertainty in the model formulation and is applicable to different statistical domains, whereas our approximations originate from a specific actuarial context. We want to illustrate however that the predictive distribution based on the comonotonic bounds provides results that are elose to the results obtained via Markov Chain Monte Carlo (MCMC). The main advantage of the bounds is that several risk measures (e.g., VaRs and stop-loss premia), and TailVaRs can be calculated easily from it. Moreover, the use of the upper bound (see below) leads to a conservative estimate.


In any realistic model for the return process, R will be a sum of strongly dependent random variables. Because one can not rely on traditional risk theory, it becomes very hard or even impossible to compute the cumulative distribution function (cdf) of R analytically. For actuaries, however, the complete predictive distribution of the reserve is important because different risk measures can be calculated from it. As illustrated by Verrall (2004) (for GLMs) and in earlier work by (for instance) de Alba (2002) (for lognormal models) Bayesian techniques are useful in this area as they provide the posterior predictive distribution of the reserve. This discussion compares the results from Bayesian GLMs (as used in Verrall) with the results based on upper and lower bounds for R for which the cdf can be easily calculated.


Let S = X^sub 1^ + . . . + X^sub n^ be an arbitrary sum of random variables. In many actuarial and financial problems (e.g., claims reserving) the distribution of such a sum of random variables is of interest (check Dhaene et al. [2002b] for various illustrations). In general, however, this cdf can not be obtained analytically because of the dependency structure involved in the stochastic vector (X^sub 1^, . . . , X^sub n^). Kaas et al. (2000) and Dhaene et al. (2002a) propose a possible way out by considering upper and lower bounds for (the cdf of) S that allow explicit calculations of various actuarial quantities.

The original dependency structure can be replaced by a simpler one, which leads to a more “dangerous” random variable, but allows analytical calculations. Decisions (e.g., “Which amount of money should be held aside?”) based on this random variable lead to a safe, conservative strategy. However, in the context of claims reserving, provisions will be held that are too high. Therefore it makes sense that one also considers a less “dangerous” random variable allowing analytical calculations of various risk measures.

Kaas et al. (2000) and Dhaene et al. (2002a) provide a mathematical foundation for these intuitive ideas. The concept “more/less dangerous” is defined by the notion of convex order. X ≤ Y in convex order if and only if E[X] = E[Y] and E[(X – d)^sub +^] ≤ E[(Y – d)^sub +^] for all real d. Upper and lower bounds for S are derived for which the terms in the involved sums are comonotonic, meaning that they are all monotonic (increasing or decreasing) functions of the same random variable. Precisely the quality of comonotonicity makes it possible to compute various actuarial quantities in an analytical way.

Hoedemakers et al. (2003) extend the results mentioned above from ordinary sums of variables to sums of scalar products of independent random variables. This extension allows to derive a lower and upper bound for the discounted loss reserve R. Based on the concept of comonotonicity, they explain how the cdf of these bounds can be derived explicitly.


Consider the run-off triangle in Table 1, taken from Taylor and Ashe (1983) and used in various other publications on claims reserving.

The discounting process from (1) (with µ = 0.08 and δ = 0.11) is incorporated in the WinBugs code for the gamma GLM. To enable comparisons with the results based on the comonotonic bounds, flat priors were used both for the row and column parameters of the linear predictor and for the scale parameter in the gamma model. Table 2 contains the results for the 95th percentile of the predictive distribution obtained via MGMG simulations with the WinBugs program. A burn-in of 10,000 iterations was allowed, after which another 10,000 iterations were performed.

As mentioned before, an upper (denoted by R^sub u^) and lower bound (denoted by R^sub t^) for R (in convex order) can be derived. R^sub u^ and R^sub t^ are sums of comonotonic terms. Precisely this quality enables the analytical calculation of various actuarial quantities (such as high percentiles). The bounds for the discounted loss reserve use the maximum likelihood estimates of the parameters in the linear predictor. To incorporate the error arising from the estimation of these parameters Hoedemakers et al. (2004) apply bootstrap methodology. For details on this procedure we refer to the paper. Table 2 compares the Bayesian 95th percentile and the bootstrapped 95th perecntile of the lower and upper bound for the different reserves. We bootstrapped 1000 times, computed each time (analytically) the 95th percentilc of upper and lower bound and report in Table 2 the 95th percentile of the bootstrap samples obtained in this way.

Compared with the Bayesian results, Table 2 illustrates that the comonotonicity approach indeed provides relevant information concerning the predictive distribution of discounted loss reserves in a GLM framework.


DE ALBA, ENRIQUE. 2002. Bayesian Estimation of Outstanding Claims Reserves. Worth American Actuarial Journal 6(4): 1-20.

DHAENE, JAN, MICHEL DENUIT, MARC GOOVAERTS, ROB KAAS AND DAVID VYNCKE. 2002a. The Concept of Comonotonicity in Actuarial Science and Finance: Theory. Insurance: Mathematics end Economics 31(1): 3-33.

______. 2002b. The Concept of Comonotonicity in Actuarial Science and Finance: Applications. Insurance: Mathematics and Economics 31(2): 133-61.

HOEDEMAKERS, TOM, JAN BEIRLANT, MARC GOOVAERTS, AND JAN DHAENE. 2003. Confidence Bounds for Discounted Loss Reserves, insurance. Mathematics and Economics 33(2): 297-316.

_____. 2005. On the Distribution of Discounted Loss Reserves Using Generalized Linear Models. Scandinavian Actuarial Journal (1): 25-45.

KAAS, ROB, JAN DHAENE, AND MARC GOOVAERTS. 2000. Upper and Lower Bounds for Sums of Random Variables. Insurance: Mathematics and Economics 27(2): 151-68.

TAYLOR, GREG, AND FRANK ASHE. 1983. Second Moments of Estimates of Outstanding Claims. Journal of Econometrics 23: 37-61.

VERRALL, R. J. 2004. A Bayesian Generalized Linear Model for the Bomhuetter-Ferguson Method of Claims Reserving. North American Actuarial Journal 8(3): 67-89.

* Katrien Antonio is a PhD student at the University Center for Statistics, de Croylaan 54, 3001 Heverlee, Belgium, Katrien.Antonio@ econ.kuleuven.ac.be.

[dagger] Jan Beirlant is a Professor at the University Center for Statistics, de Croylaan 54, 3001 Heverlee, Belgium, Jan.Beirlant@wis.kuleuven. ac.be.

[double dagger] Tom Hoedemakers is a PhD student at the Department of Applied Economics, Catholic University Leuven, Naamsestraat 69, 3000 Leuven, Belgium, and at the University Center for Statistics, Tom.Hoedemakers@econ.kuleuven.ac.be.



I thank Professor Verrall for an interesting paper presenting an innovative and intriguing set of methods for Bayesian reserving. I welcome this opportunity to contribute several comments, some questions for the author, and some of my own suggestions. I will focus my remarks towards the over-dispersed Poisson model introduced by Verrall, leaving it to be understood that similar ones may also be directed towards his negative binomial model.


Now, this over-dispersed Poisson distribution plays two roles in Verrall’s model. On the one hand, realizations drawn from the posterior predictive distribution are simulated using (2.1), eonditional on the current simulated values of the model parameters µ^sub i,j^ and φ. By definition, any such simulated values of C^sub i,j^ are necessarily non-negative. This is the case for predictive draws corresponding to future observations, and also for replicated draws as described below. On the other hand, Verrall adopts a quasi-likelihood approach in order to extend the definition of the model in order to accommodate arbitrary positive or negative observed claims data values.

The method of posterior predictive checking (as described in Gelman, Garlin, Stern, and Rubin [2004, Chapter 6]) is often used in order to determine whether a Bayesian model is consistent with observed data. The idea here is that if “the model fits, then replicated data generated under the model should look similar to observed data. To put it another way, the observed data should look plausible under the posterior predictive distribution” (Gelman et alia, 2004, page 159). To be clear, think of replicated data as the data “that could have been observed, or, to think predictively, as the data we would see tomorrow if the experiment that produced” today’s data “were replicated with the same model and the same” parameter values that produced the observed data (Gelman et alia, 2004, page 161).

This leads us to identify a problem inherent in Verrall’s Bayesian model, namely, that replicated data values corresponding to negative observed values will never themselves be negative. Professor Verrall notes that “the model can cope with some negative values (as in the data set used in Section 6)” (page 80). However, I believe the point I’ve identified above recommends against this model’s use whenever negative values are contained in the observed data. Otherwise, the posterior predictive distribution will be inconsistent with the observed negative data values. Note that the posterior predictive distribution will also be inconsistent with future expectations if negative values are anticipated in the future data.

I believe it would be useful, and I would personally be grateful, if Professor Verrall were to provide some precise references to papers in the mainstream statistics literature in which an overdispersed Poisson model has been applied to a data set including negative observed values. I would be interested in reading what the statistics literature says on this subject.


Professor Verrall notes that a “fully Bayesian approach would assign a prior distribution for φ as well, and integrate it out” (page 70). This is fairly easy to accomplish within WinBugs, as demonstrated below.

The WinBugs code for the over-dispersed Poisson model given in the Appendix of Verrall’s paper includes these lines:

(see page 83). Unfortunately, some of the variable names used in the WinBugs code contained in the Appendix of Verrall’s paper differ from the names used in its main body. In the WinBugs code reproduced above, Z[i] represents one of the incremental claim amounts C^sub i,j^ divided by 1000, mu[i] is the corresponding mean, and scale appears to be the variable that corresponds to the parameter φ. These lines of WinBugs code define the contribution made by an observed incremental claim to the likelihood in accordance with (2.1), in the special case that the parameter φ is known. Note that the variable phi [i] is the minus log likelihood term (up to an additive constant) associated with the particular observed incremental claim, when the parameter φ is assumed fixed and known. However, this term is incomplete when φ is treated as a stochastic variable. In this case, I believe that the WinBugs code should be changed to:

The value assigned to phi [i] comes from taking minus the logarithm of expression (2.1). Note that the log of the gamma function (i.e., loggam) may cause difficulty if the Z [i] variable appearing in its argument takes on a negative value. This is another reason for suggesting that Verrall’s model is not appropriate when negative values appear in the claims data.

If the modification described above is the only change made to Verrall’s code, then the simulation results will be unchanged provided that the variable scale (i.e., the parameter φ) is held fixed at the same value as before. In his analysis, Verrall uses the value 1.08676 for scale and the following assignment appears in the WinBugs code:

(see page 84). However, we want to perform the analysis incorporating a prior distribution for the ; variable scale instead of assigning it a fixed value. For the sake of illustration, we assign scale a prior distribution that is gamma. Its prior mean is set equal to the previously fixed value of 1.08676, and the analysis will be performed using two different values for its prior variance (i.e., 0.01 and 0.1). To model this in WinBugs, replace the single line of code given above with these:

The variable scale should also be assigned an initial value (e.g., a value around 1.2) when the initial values of the parameters are loaded prior to running this simulation in WinBugs.

I ran Verrall’s original model, along with the two models incorporating a stochastic scale described above. Each simulation was allowed to burn-in for 50,000 iterations, and then allowed to run for another 10,000 iterations in order to obtain the results shown below. Table 1 shows the results of running Verrall’s original WinBugs code for the over-dispersed Poisson model. The Baycsian Mean Reserve (BMR) and the Bayesian Prediction Error (BPE) are the estimated mean and standard deviation of the posterior distribution as returned by WinBugs. The Bayesian Prediction Error % is the ratio of the BPE to the BMR (expressed as a percentage). These results are essentially the same as those reported in Table 2 of Verrall’s own paper (see page 77). Tables 2 and 3 show the results for the model when the variable scale (i.e., the parameter φ) is treated as a stochastic variable with a prior distribution that is gamma, with a mean of 1.08676 and a variance of 0.01 and 0.1, respectively. It is apparent that the reserves in Tables 1 and 2 are smaller and much less variable than the ones reported in Table 3. Note that the 95% posterior probability intervals reported by WinBugs for the overall reserve are (25,000, 102,200), (21,200, 115,200), and (9,069, 189,300), for the models underlying Tables 1, 2, and 3, respectively. Also note that for the model underlying Table 2, the parameter φ has an estimated posterior mean and standard deviation of 1.501 and 0.1122, respectively, as returned by WinBugs. For the model underlying Table 3, these values are 4.197 and 0.569.

The DIC Tool dialog box in WinBugs may be used to monitor the Deviance Information Criterion (DIC) (Spiegelhalter, Best, Carlin, and van der Linde, 2002) and related statistics. These statistics can be used to assess model complexity and compare different models. The model with the smallest DIG is estimated to be the model that would best predict a replicate dataset (as discussed above) of the same structure as that currently observed. I monitored the DIC for each of the three models I simulated. For the model underlying Table 1 (with scale fixed and set equal to 1.08676), the DIG was 213.4; for the model underlying Table 2 (with a stochastic variable scale with prior variance of 0.01), it was 190.4; and for the model underlying Table 3 (with a stochastie variable scale with prior variance of 0.1), it was 133.8. These results suggest that the third model is the best of the three.

This example shows that it is relatively straightforward to implement a fully Bayesian analysis of Verrall’s over-dispersed Poisson model, incorporating a prior distribution for φ, in WinBugs. In this case, the result was a better fitting model. Furthermore, ignoring the stochastic variability in φ resulted in an underestimation of the various years’ reserves, and in a significant understatement of their variability. Verrall’s negative binomial model may also benefit from the introduction of a stochastic, rather than fixed, parameter φ. In cither case, the prior distribution for φ is probably best determined on the basis of expert opinion or on the results of a prior study. If a vague, by which we mean improper or proper but relatively widely dispersed, prior distribution is used for φ, then it may be necessary to tighten up some of the other parameters’ prior distributions in order for WinBugs to successfully compile and run.


The author acknowledges support from the Natural Sciences and Engineering Research Council of Canada.


GELMAN, ANDREW, JOHN B. CARLIN, HAL S. STERN, AND DONALD B. RUBIN. 2004. Bayesian Data Analysis. 2nd ed. Booa Raton, United States: Chapman & Hall/CRC.

SPIEGELHALTER, DAVID J., NICOLA G. BEST, BRADLEY P. CARLIN, AND ANGELIKA VAN DER LINDE. 2002. Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society, Series B: Statistical Methodology 64: 583-616.

* David P. M. Scollnik, ASA, PhD, is a Professor in the Department of Mathematics and Statistics, University of Calgary, 2500 University Drive NW, Calgary, Alberta, Canada, T2N 1N4, e-mail: scollnik@ucalgary.ca.


I would like to thank the discussants for their interesting contributions. As both point out, Bayesian statistics offer great possibilities for the actuary and we have some good suggestions here for what could be useful. For readers who would like to try the Bayesian methods using the software WinBUGS, I have included more detailed instructions in the Forum of the Casualty Actuarial Society, Verrall (2004). This paper also contains some other examples of how Bayesian methods might be useful, which are covered in more detail in Verrall and England (2005). This is clearly an expanding area, as is shown by the contributions of the discussants and the papers mentioned above. I expect that it will continue to develop fast, and that the presently predominant methods will be replaced by better approaches in the future.

Professor Scollnik makes some criticisms of the paper, which could be argued arise because of my desire to provide stochastic methods that stay close to the present actuarial procedures. Given complete freedom to decide on what data should be supplied, what methods are acceptable to management, regulators, etc., and complete freedom to use methods as complicated and sophisticated as might be desirable, I would probably do things differently. Faced with the reality of the pressures of commercial life, my desire has been to show how it is possible to move gradually towards a stochastic approach. Once this has been achieved (and it is beginning to happen), there will be more possibilities to use other methods, and I am sure that researchers, such as Scollnik, will be able to provide better models. For exampie, Scollnik helpfully shows how the estimation of the dispersion parameter may be included in the Bayesian model. This is something I have also looked at, using the suggestions of West (1985), but I decided that it would be better to make a consistent comparison with the non-Bayesian methods where the usual procedure is to ignore the effect of the estimation of the dispersion parameter.

Strictly, the use of the over-dispersed Poisson or negative binomial implies that the data consists of multiples of the dispersion parameter. In practice, this can be over-looked for the observed past data by using the “zeros trick,” as I did in the WinBUGS code here. It is obviously more problematic when simulating future observations: The properties of the predictive distributions will be consistent with the assumptions made, but the distribution of a single cell in the run-off triangle will appear unrealistic. Another approach is to use an alternative forecast distribution-such as the Gamma distribution-parameterised such that the mean and variance equal their theoretical values. In making this pragmatic compromise, it could be argued that we are introducing a conceptual inconsistency. An alternative to the overdispersed negative binomial distribution is to use the normal distribution. This was applied in England and Verrall (2002) in the context of the chain-ladder technique, and an equivalent approach could also be used for the BornhuetterFerguson technique, and it is possible that a strict Bayesian would prefer this.


VERRALL R. J. 2004. Obtaining Predictive Distributions for Reserves Which Incorporate Expert Opinion. Casualty Actuarial Society Forum, Fall.

VERRALL, R. J., AND P. D. ENGLAND. 2005. Incorporating Expert Opinion into a Stochastic Model for the Chain-Ladder Technique. Insurance: Mathematics and Economics. Forthcoming.

WEST, M. 1985. Generalised Linear Models Scale Parameters, Outlier Accommodation and Prior Distributions. In Bayesian Statistics 2, ed. A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West. Amsterdam: North-Holland.

Copyright Society of Actuaries Jul 2005

Provided by ProQuest Information and Learning Company. All rights Reserved