“Toward a Unified Approach to Fitting Loss Models,” Stuart Klugman and Jacques Rioux, January 2006

“Toward a Unified Approach to Fitting Loss Models,” Stuart Klugman and Jacques Rioux, January 2006

Sun, Jiafeng


The paper by Professors Klugman and Rioux provides a unified approach to fitting loss models. We agree with the authors on the four points needed to identify an appropriate model: selecting candidate distributions, estimating parameters, evaluating models, and determining which model fits best. To underscore the importance of their work, we would like to consider a more general framework by discussing how an actuary would introduce explanatory variables, also known as covariates, thus employing a regression modeling framework. Klugman, Panjer, and Willmot (2004) present the use of covariates using the Cox proportional hazards, a type of regression model for lifetime data. In this discussion we focus our attention on regression models with heavy-tailed distributions.

In practice, actuaries often encounter situations where the measurements of interest are related to other factors. However, implicit in the usual regression routines is the requirement of approximate normality-hardly satisfactory for loss data with heavy tails. One solution that has been widely used in the literature to deal with the skewness is to apply a logarithmic transformation of the dependent variable Y and then apply the ordinary least square (OLS) on In(F). A second is to use generalized linear models (GLMs), which provide a natural way to include covariates in the models. A third approach is to use parametric survival models, including location-scale models and proportional hazard models (Lawless 2003). A fourth is to use more flexible positive random variable distributions, such as the Burr distribution or the generalized gamma distribution, to model the data. This last approach has been widely used in the literature of econometrics.

We review the last three approaches briefly in this discussion. Following Klugman and Rioux’s theme, we illustrate the model-fitting process using State of Wisconsin nursing home data in cost report year 1999.


The GLM technique has been applied in actuarial science since the early 1980s. Haberman and Renshaw (1996) reviewed the applications of GLMs to actuarial problems, including survival models in life insurance, multiple-state models in health insurance, loss distributions fitted for claim severities, premium rating, and claims reserving in nonlife insurance. In the classic work by McGullagh and Nelder (1989), many examples of insurance were given to illustrate how to fit GLMs to different types of data.

GLMs extend traditional linear models and represent an important class of nonlinear regression models. In GLMs a random variable Y has a distribution in the exponential families. Discrete distributions in the exponential family include the binomial, negative binomial, Poisson, and multinomial. Continuous distributions include the normal, gamma, and inverse Gaussian. The idea of using GLMs is to map a linear systematic component to the mean of the variable of interest through a known differentiable monotonie link function. As in the models where the leastsquares estimates depend only on the assumptions of constant variance and independence, in GLMs the second-order properties depend on the relationship between the variance and the mean, as well as the independence between observations. Therefore, in GLMs the variance is not required to be constant as in the linear models, but a function of the mean. The models are fitted by maximum likelihood using iterated reweighted least squares techniques; see, for example, Frees (2004). For a complete treatment of the theory, refer to McGullagh and Nelder (1989).

In regression analysis the process of examining a preliminary model fit and using information about any lack of fit to improve the model specification is known as diagnostic analysis (much like in a medical context where a physician examines a patient to detect symptoms of poor health). Residual analysis, as an important type of diagnostic analysis, involves both the numeric and graphical inspection of the estimated models. For ordinary regression models, residuals are defined as the difference between the fitted value and the observed value. For GLMs, however, an extended definition of residuals that is applicable to all distributions besides the normal distribution is needed. Several definitions have been proposed; see, for example, McGullagh and Nelder (1989) and Pierce and Schafer (1986). In practice, however, the most widely used residuals are Pearson residuals and deviance residuals. A Pearson residual is defined as the signed square roots of the contribution to the Pearson goodness-offit statistic. A deviance residual is the signed square root of the contribution to the deviance goodness-of-fit statistic. Pierce and Schafer made an extensive examination of the residuals in GLMs and found that the deviance residual is preferred to the Pearson residual for model-checking procedures since, for nonnormally distributed data, the distribution of the Pearson residual is often skewed, whereas, the deviance residuals are nearly normally distributed.


Parametric survival analysis uses regression models to analyze censored data, but the methods certainly can be applied to complete data. For a complete treatment of parametric survival models, see Lawless (2003) and Kalbfleisch and Prentice (2002). There are two classes of parametric regression models in survival analysis: accelerated failure time (AFT) and proportional hazard (PH) models. An AFT model is also called a loglocation-scale model. In an AFT model, In(F) follows a parametric location-scale density distribution in the form f(y) = g((y – u)/b)/b, where u and b > 0 are location and scale parameters, and g(.) is a pdf on (-∞, ∞). Most commonly used lifetime distributions possess the property that their log transformation has a location-scale distribution. For example, the log transformation of the Weibull, lognormal, and loglogistic variables follow the extreme value, normal, and logistic distributions, respectively. In regression analysis, the location parameter u is replaced by x’β to incorporate covariates, where x is a vector of covariates including an intercept, and β is a vector of regression coefficients to be estimated. We may also specify that the scale parameter b depends on x. Since b is positive, a common specification is b(x) = exp(x’ψ). A proportional hazard model possesses the property that the hazard function of different subjects is a scalar multiple. One particularly useful form is h(y|x) = h^sub 0^(y)e^sup x’β^ where h^sub 0^(t) is the baseline hazard function; see Klugman, Panjer, and Willmot (2004).

In parametric survival models, the residuals can be treated as a function of the response variable, the covariates, and the maximum likelihood estimate (mle) of the unknown coefficients. If the model is appropriate, the residuals should be approximately independent and identically distributed for some large sample size n. For example, for AFT models, we can define residuals as z^sub i^ = (y^sub i^ – û^sub i^)/b, where b is the mle of the scale parameter, and û^sub i^ = u(x^sub i^, β) is the location parameter. As in other models, graphic tools can be used to inspect the estimated models. Plots of the z^sub i^ against the covariates of the fitted value of location parameter u can be used to check the constancy of scale parameter b; if there is a pattern present in the plot, we might introduce covariates for the scale parameter, with the choice of covariates not dependent on those included in the location parameter. The q-q plot or the p-p plot of the residuals also can be used to assess the performance of the model fitting.


In addition to the distributions discussed in the preceding sections, some other more flexible distributions have been applied in regression analysis, including the generalized beta II (GBII), Burr, and generalized gamma distributions. The generalized gamma distribution is also known as the transformed gamma distribution (see, e.g., Klugman, Panjer, and Willmot 2004, p. 635). McDonald and Butler (1990) considered regression models including those commonly used and general parametric distributions such as the GBII and generalized gamma distribution. They applied the model to the duration of incidences of poverty and found that the GBII improved model fitting significantly over the lognormal. Beirlant et al. (1998) proposed two Burr regression models and applied them to portfolio segmentation for fire insurance. Manning, Basu, and Mullahy (2005) applied the generalized gamma distribution to inpatient expenditures using the data from a study of hospitals conducted at the University of Chicago. As an example, we will discuss the generalized gamma distribution.


Nursing home financing has drawn the attention of policymakers and researchers for the past several decades. With the aging population and increasing life expectancy, expenditures on nursing homes and demands of long-term care are expected to increase. In this section we will analyze data from 364 nursing facilities in the state of Wisconsin in the cost report year 1999 using both generalized linear models (the inverse Gaussian and gamma distributions) and generalized gamma regression models. The data are publicly available; see http://dhfs.wisconsin.gov/provider/ prev-yrs-reports-nh.htm for more information.

5.1 Summary Statistics

To illustrate the model-fitting process, consider the response variable Total Patient Years (TPY). We defined a new variable, TPY, as total patient days divided by the number of days the facility was open. We also created a binary variable “Open” to account for the effect of partial year opening on the nursing facilities, with 1 indicating full year opening.

Table 1 displays the descriptive statistics for the continuous variables. The mean of the TPY is larger than the median, indicating that the distribution of the TPY is right skewed. The number of beds (NumBed) and square footage (SqrFoot) of the nursing home facility both measure the size of the facility. Not surprisingly, these continuous variables turn out to be important predictors of the TPY; larger facilities have more capacity and thus are more likely to admit more patients. Both NumBed and SqrFoot are highly correlated with the dependent variable TPY, and with each other (0.84).

Besides Open, several categorical explanatory variables also are included in our model. About 60% of the facilities have self-funding of insurance (SelfFundlns), and approximately 85% of the facilities are Medicare certified (MCert); Medicarecertified facilities also are larger in terms of the median number of patient years. Regarding the organizational structure, about half (53.3%) are run on a for-profit (Pro) basis, about one-third (36.5%) are organized as tax exempt (TaxExempt), and the remainder are governmental organizations. The government facilities have the highest median TPY. Slightly more than half of the facilities are located in an urban (Urban) area (54.1%); these facilities have a higher median TPY than those located in rural areas.

5.2 Fitting Generalized Linear Models

To obtain intuitive knowledge of the distribution of the TPY, we created quantile-quantile plots, which compare the empirical quantiles to the quantiles from the estimated parametric models. Another way is to prepare the p-p plot as Professors Klugman and Rioux did in Figures 9 and 10 of their paper. The normal q-q plot, not shown here, exhibits departure from the 45° line, indicating that the empirical distribution differs from the theoretical distribution; therefore the normal regression model is not a reasonable fit. Figure 1 presents the q-q plots of the inverse Gaussian distribution and the gamma distribution. The rightskewed data fall fairly close to the 45° line in both panels, which means that both models are reasonable choices.

Table 2 summarizes the parameter estimates of the models. Both the scaled deviance and scaled Pearson’s chi-square statistics can be used to assess the goodness of fit. With degrees of freedom equal to the number of observations minus the number of parameters estimated, the statistics have a limiting chi-square distribution under certain regularity conditions. For example, in the inverse Gaussian model, the scaled deviance is 364.00 with 355 degrees of freedom. Straightforward calculations show that the p-value is 0.359, indicating that the model fits the data fairly well. Although not shown here, a similar conclusion can be reached for the gamma model. By comparing the Schwartz Bayesian Criterion (SBC) statistics as suggested by Professors Klugman and Rioux or the log likelihood since the number of estimated parameters and the sample size in both models are identical, we find that the gamma model seems to perform better than the inverse Gaussian.

As anticipated, the coefficient for the size variable NumBed is positive and significant in both models, SqrFoot is significant only in the inverse Gaussian model. The dummy variable Open is also marginally significant in both models, indicating that given other covariates, those nursing facilities that are open the whole year will have a greater TPY than ones open for a partial year. Since we chose the identity link in the models, the parameter estimates can be explained in the same way as in the traditional linear models. For example, in the inverse Gaussian model, conditional on the other variables, the TPY for the nursing facilities that were open for the whole year in 1999 is on average greater than that of facilities open for a partial year by 2.953.

Figure 2 presents the standardized deviance residual plots against the fitted value of the TPY for the inverse Gaussian and the gamma models. No patterns are found in the plots, so our models are reasonable fits to the data.

5.3 Fitting Generalized Gamma Models

The generalized gamma distribution has been widely used in the econometrics literature. As discussed in section 4, if Y follows a generalized gamma distribution, In Y has a location-scale distribution. In our regression model, the location parameter is replaced by the linear combination of the explanatory variables. The only difference from equation (D.2) is that here we take the logarithm transform of NumBed and SqrFoot, because in the log-location-scale model, we are modeling the logarithm of the variable of interest through the linear combination of the explanatory variables. We used the statistical software R to maximize the loglikehood of the model. The parameter estimates are summarized in Table 3.

The variable In (NumBed) is positive and significant as expected. The coefficient is close to 1, which indicates that the log of the variable occupancy rate, defined as the ratio of TPY and NumBed divided by number of days the facility was open, could be an interesting variable to study. The variables Open, TaxExempt, and Urban are significant too. We define the residuals as z^sub i^ = (In y^sub i^ – µ^sub i^)/σ. Figure 3 presents the residual analysis of the model. The left panel is the plot of µ^sub i^ against residual z^sub i^; since there is no obvious pattern present, we do not include the covariates in the scale parameter σ. The middle panel is the q-q plot of the generalized Gamma model for the residuals, and the right panel is the p-p plot for the residuals in the model. From the plots we can tell that the model fits well to the data. Moreover, the SBG of this model is much higher than that of the inverse Gaussian model and the gamma model.


We discussed three regression approaches to modeling heavy-tailed positive data. Note that these approaches are not mutually exclusive. The more flexible generalized gamma model fits the best compared to the generalized linear models.


BEIRJANT, JAN, YURI GOEGEBEUR, ROBERT VERLAAK, AND PETRA VYNCKIER. 1998. Burr Regression and Portfolio Segmentataion. Insurance: Mathematics and Economics 23: 231-50.

FREES, EDWARD W. 2004. Longitudinal and Panel Data: Analysis and Applications in the Social Sciences. Cambridge: Gambridge University Press.

HABERMAN, STEVEN, AND ARTHUR E. RENSIIAW. 1996. Generalized Linear Models and Actuarial Science. The Statistician 45: 407-36.

KALBFLEISCH, JOHN D., AND Ross L. PRENTICE. 2002. The Statistical Analysis of Failure Time Data. New York: John Wilcy and Sons.

KLUGMAN, STUART A., HARRY H. PANJER, AND GORDON E. WILLMOT. 2004. Loss Models: From Data to Decisions. New York: John VViley and Sons.

LAWLESS, JERALD F. 2003. Statistical Models and Methods for Lifetime Data. New York: John Wilcy and Sons.

MANNING, WILLARD G., ANIRBAN BASIT, AND JOHN MULLMIY. 2005. Generalized Modeling Approaches to Risk Adjustment of Skewed Outcomes Data. Journal of Health Economics 24: 465-88.

McCu-LAGIi, PETKR, AND JOHN Ë. NEUIEK. 1989. Genera/iced Linear Models. 2nd cd. London: Chapman & Hall. First published in 1983.

McDONALD, JAMES B., AND RICHARD J. BITLER. 1990. Regression Models for Positive Random Variables. Journal of Econometrics 43: 227-51.

PIERCK, DONALD A., AND DANIEL W. SCHAFER. 1986. Residuals in Generalized Linear Models. Journal of the American Statistical Association 81: 977-86.

PRENTICE, Ross L. 1974. A Log Gamma Model and Its Maximum Likelihood Estimation. Biometrika 61(1): 539-44.

STACY, E. W. 1962. A Generalization of the Gamma Distribution. Annals of Mathematical Statistics 33: 1187-92.



* liafeng Sun is a PhD candidate in Actuarial Science, Risk Management and Insurance, University of Wisconsin-Madison, 975 University Ave., Madison, Wl 53706, jiafengsun@wisc.edu.

[dagger] Edward W. Frees, FSA, PhD, is Assurant Health Professor of Actuarial Science with the School of Business, University of WisconsinMadison, 975 University Ave., Madison, Wl 53706, jfrees@ bus.wisc.edu.

[double dagger] Marjorie A. Rosenberg, FSA, PhD, is Associate Professor in Actuarial Science, Risk Management and Insurance and Biostatistics and Medical Informatics, University of Wisconsin-Madison, 975 University Ave., Madison, Wl 53706, mrosenberg@bus.wisc.edu.

Copyright Society of Actuaries Apr 2006

Provided by ProQuest Information and Learning Company. All rights Reserved