“A regime-switching model of long-term stock returns” by Mary Hardy, April 2001

“A regime-switching model of long-term stock returns” by Mary Hardy, April 2001 / Reply

Klein, Gordon E


Dr. Hardy is to be congratulated for writing this very interesting paper. I read it right after teaching two classes covering the material on Exam 4, and I think that it provides a wonderful example of using the methods of that exam (fitting of model parameters using the method of maximum likelihood, likelihood ratio tests, Schwartz Bayesian Criterion, and Akaike Information Criterion). I would recommend it as a supplement to anybody teaching that material.

The objective of the paper is summed up by the title-to find a model for long-term stock returns. In particular, Dr. Hardy describes a 10-year European put option with a strike price of 75 or 100% of stock index value at contract inception.

The paper actually develops a model for short– term (one month) stock returns-a model that is preferred to other candidates using criteria of the likelihood ratio test and similar tests. The main question that I want to address is this: Can this model for monthly stock returns lead to a sufficiently good fit in the left tail of the implied distribution of long-term stock returns to lead to a good estimate of the price of a 10-year put option?


This is convenient in that it leads to tractable results. However, as Dr. Hardy points out, empirical studies do not bear out the ILN model. There are too many outcomes in the extremes of the distributions, and the large outcomes and small outcomes tend to be bunched together.

The Regime-Switching (R-S) model of this paper incorporates both of these empirical phenomena much better than the ILN does. However, it does not scale to other time periods. This is not merely a loss of convenience. The model does not have what I like to think of as a “believable story” behind it.

The “story” behind the ILN model is that of Brownian Motion-information is incorporated into stock prices in a continuous fashion as it becomes known. With R-S, this story still holds, except that at the beginning of each month, the volatility parameter of the Brownian Motion may change to another possible value. (In this paper, there are two possible values for the volatility parameter.)

It is that monthly frequency that does not scale. For instance, if we change the model to one with possible changes in volatility parameter at weekly intervals, one would have two incompatible models. Assuming four weeks per month for convenience, the weekly-change model would have five possible variances for monthly data, corresponding to the possible values of zero, one, two, three and four weeks where the low weekly variance parameter is in effect. Likewise, the monthly-change R-S model implies an annual model where there are 13 possible variance levels, but an annual-change R-S model would have only one. Which is preferred?

A model that would scale to any time period is one with two variance levels and a continuous– time Markov process for the changes between variance levels. (By the way, the continuous-time Markov process is a topic that has been removed from Exam 4.) The transition times are not restricted to a lattice. In such a model, the smallest monthly variance would result from being in the low-variance state for the entire month, and the largest would be from being in the high-variance state for the entire month. But the variance for any particular month could be anywhere in the interval between these endpoints. This model is the limiting case of the discrete-change-frequency R-S model as the change frequency goes to smaller and smaller intervals.

More generally, the variance parameter could change in continuous time and take on values from a continuum. This paper considers some models of this type and rejects them. They either do not fit the data as well as R-S, or they take too many parameters to attain their fit. I would be interested to see how a model such as that of the prior paragraph fits and whether any additional complications are justified by the benefits of scalability.


The Loss Models1 textbook discusses goodnessof-fit tests as the next step once a model had been selected in the manner of this paper. Parameters have been fitted to several models, and the Likelihood Ratio Test and similar tests indicate the superiority of the R-S model within the set of models tested. But, is there some goodness-of-fit test along the lines of the Chi-Squared and Kolmogorov-Smirnov tests to determine whether we can conclude that it is a good model and not simply the best of a set with no good models?


Loss Models discusses the “Delta Method” for finding an approximate confidence interval of a function of parameter estimates. In this paper, the six parameters that were estimated were the means and variances of the two normal distributions (low and high volatility) and the transition probabilities.

Dr. Hardy discusses (p. 44) the high estimates of the standard deviations of the estimators of the transition probabilities. For example, for the TSE time series, the probability of shifting out of the low volatility state in a particular month has a point estimate of .0371 with a standard error of .012. Likewise, the probability of shifting from the high to the low volatility state has a point estimate of .2101 with a standard error of .086.


The two-point mixture of normals is a model that reflects the empirical monthly data having more extreme values than would be expected with a single normal distribution. In fact, the independent two-point mixture of normals might be another model to consider. It has five parameters, one fewer than R-S. It has almost the same distribution for S^sub 120^ as R-S. Would the method of maximum likelihood produce similar results with that model? Would it win out under the Likelihood Ratio Test and its variations?

Another model that might have been considered is the Independent Log-Stable-Paretian. The Stable Paretian distribution is a class that includes the normal. Its other members are more “fat-tailed” than the normal, so they can be fitted to data with more extreme values than a single normal can be fitted to.2

Since the R-S model does not scale, it would be interesting to see it fit with other time frequencies for the possible parameter shifts and to see how sensitive the 10-year put prices are to the change in that time frequency.


In Section 8 of the paper, Dr. Hardy says that one must have a “realistic fit in the left tail” of the distribution of the stock price at expiration of the put option to price it well. This is clearly the case, but I am not convinced that this model provides such a realistic fit. We have not been shown its fit for the past data. And, what evidence is there that the next 120 outcomes of the time series S^sub t^ will follow the same process as the past, even if we could say what process that was? The last several years of stock market returns make me think that the mean and variance may simply shift over time, not to values from a small set of known possibilities, but to values that would not have been thought “realistic” prior to their occurrence.

The history of the TSE index consists of approximately 4.5 disjoint 10-year periods. This seems inadequate to model future 10-year periods. It would also be hard to argue that the returns for each decade are outcomes of a single random process such as that of this paper.

One other test that I would find interesting is this: Delete the data for the most recent 120 months (in this paper, 1990-1999), and then estimate the parameters for the remaining data. Find the distribution of S^sub 120^, the index value at the end of 1999. At what percentile of this distribution is the actual outcome (for each index)?

Finally, it would be interesting to know how the finance literature addresses the modeling of long– term options, if at all. In Canada and the United States, options are traded with maturities as long as three years. If insurance companies have a need for 10-year options to hedge options that they have sold, one would thing that it might be possible to find a seller in the financial markets.

Although I said earlier that I do not put much credence in the numerical values calculated for the put prices, I want to emphasize that this is not to say that the R-S method is bad in the sense that it can be greatly improved upon. Rather, I believe that it may do as good a job as can be done. The problem is that there may be answers quite different from those in this paper that I would also say do about as good a job as can be done. I think that we actuaries cannot put a value on a 10-year put option on a stock index with anything like the precision and confidence with which we can put a value on more traditionally actuarial future cash flows.



I thank Gordon Klein for his very considered comments on the paper. He has raised some interesting issues; I am grateful for the opportunity to address them.


Scale invariance is an attribute of the lognormal distribution; it requires, for example, that the accumulation factor for two consecutive months should have the same distribution as the accumulation factor over individual months, with appropriate adjustment to the parameters. The lognormal distribution is not the only scale invariant distribution; any log-stable distribution (which includes the lognormal distribution as a special case) would have the same feature. Scale invariance is certainly a very attractive characteristic of any model for investment returns. The advantage of having a single distribution describe the investment returns over any time unit is considerable. Also, as Mr. Klein states, there are reasons why scale invariance is appropriate, and there are substantial advantages in the mathematics. The problem is that this assumption is not supported by the data. Monthly, daily, and intra-day data all show autocorrelations that rule out scalability — scalability requires independent identically distributed increments, and therefore zero autocorrelation.

Now we have a choice. To be consistent with scale invariance will require us to use models inconsistent with historical data. Moreover, the bias can be expensive, in that the log-stable distributions give tails that are too thin (compared with the data), leading to under-estimation of the risk from low returns. The data on stock returns show significant positive first-order autocorrelation when we measure by month or by week. Ignoring the autocorrelation gives thinner tails in the longer term accumulation factors than are supported by the autocorrelation structure in the data. Mr. Klein states that the model is inconsistent with the economic “story”. Well, then either the story or the data are wrong – I choose to be guided by the historical data.

Mr. Klein sets a lot of store by what he terms a believable story. I would argue that the regime switching model has just as much claim to legitimacy as the scale invariance story and probably more so in an economic framework. In his original work, Hamilton makes the point that there are distinct macroeconomic regimes. Because equity returns are related to the expectations of future productivity, the regime-switching model in this paper has a very natural economic interpretation. There can be more than one believable story and perhaps some are more believable than others.


I agree with Mr. Klein that it is useful to have continuous time models for certain applications. A continuous time model with a regime-switching structure exists (Naik, 1993). However, if we take the continuous time regime-switching process and observe it in discrete time, we will have a different model from that described in this paper. I am not sure what the discrete time model derived from the continuous time regime-switching model would look like; I doubt that it would be particularly tractable. For applications using stochastic simulation, we would need to use a discrete time version. For fitting to the discrete time data available, we would need to use a discrete time version. If stochastic simulation is the main focus of the work, it seems reasonable to go straight to the discrete time model. Note that Mr. Klein’s objection to a model that is not derived first in continuous time rules out all discrete time series, for example, the auto-regressive or autoregressive conditionally heteroscedastic families for economic application. In fact, there is an enormous body of literature using these kinds of discrete time distributions for economic time series, in both the econometrics and the finance fields. We should not rule out all these immensely useful models because they cannot be derived from a unifying continuous time model.


Goodness-of-fit tests, such as the chi^sup 2^ test, are applicable to larger samples where the number of observations in various ranges can be compared with the expected numbers from the model tested. For smaller samples, as Mr. Klein notes, the Kolmogorov Smirnov test can be used. In time series, the whole concept of sample size is a bit fuzzier. If we were to assume that returns in each time period were independent, we could treat the data as a random sample of 529 independent observations and could then apply one or other goodness-of-fit test. Once we allow for autocorrelation, we lose all the tests that depend on having independent observations; rather than a set of 529 individual observations, we have one observation of a 529-variate random variable. The goodness-of-fit tests don’t exist for single-observation data. We must determine whether the internal structure of the single observation of 529 values is consistent with the model. To accomplish this, we must use the model selection tests, such as AIC, SBC, and the likelihood ratio test, and subjective judgements such as the ability of the model to explain outliers like the return in October 1987.


Mr. Klein has pointed out the high standard errors for the regime-two parameters. Because the process is not often in regime two, the estimates are, essentially, based on fewer observations than the regime one estimates. I agree that this is unfortunate. I am happy to consider ways to improve this situation. Using a longer data series might be one way; although as the data extends back into the WWII period, there may be inappropriate effects on the estimates (unless you believe that the estimates should allow for the possibility of another world war). Mr. Klein suggests using the delta method for a confidence interval for the put option prices, which is a very good idea.

The delta method relies on the asymptotic properties (i.e., large sample properties) of the maximum likelihood estimator. These properties apply, with certain conditions, even for time series data where the data are not independent. However, further research on the estimation of the parameters using Bayesian methods indicates that the parameter estimates are not very normal for this sample size, so we should not rely too heavily on the asymptotic properties. See Hardy (2001) for further details.

An interesting question is whether using a better-fitting model with some greater parameter uncertainty would give worse results than using a poorly fitting model with less parameter uncertainty. I think that the better fitting model would at least give a more accurate range of results, and I thank Mr. Klein for pointing out that the range may be quite broad.


The two point mixture of Normal distributions that Mr. Klein suggests could be modelled as a special case of the regime-switching model, where the probability of being in regime one (say) on month t to t + 1 is the same whether the previous month was spent in regime one or two. The likelihood ratio test p-value for this distribution compared with the two-regime model in the paper is approximately 10^sup -4^. This indicates that the simpler model (that is, the two point mixture) should be rejected in favour of the regime-switching model. This is not surprising because it is clear that autocorrelation in the data needs to be modelled, and it is not modelled in this structure.

I have also compared the log-stable distributions suggested by Mr. Klein with the regime– switching distributions for the data used in the paper. For the S&P 500 data, the likelihood ratio test p-value is approximately 3 X 10^sup -4^. For the TSE 300 data, the likelihood ratio test p-value is less than 10^sup -4^. In both cases, all three model selection criteria favour the regime-switching model.


Mr. Klein asks what evidence there is that the next ten years of stock returns will follow the same process as previous years. I would reply, what else would you have us assume?

Insurers in the UK made an alternative assumption in the 1980s when they offered guaranteed annuity rates on variable annuity type contracts. The rates offered were out-of-the-money provided interest rates did not fall below around 6%. Rather than modelling using past data, the insurance company actuaries decided that such a fall was impossible and that there was no need to make any provision for these options. The actuaries were wrong, of course, and one company has wound up and two more are in serious difficulty because of the costs arising from these guarantees. Had the actuaries used past data to model the options, they would certainly have come to a different conclusion. There may be adverse consequences when we ignore the objective data and rely too heavily on subjective judgement.

If insurers are to offer benefits that depend on stock returns, actuaries must model them. And, because we must model them, let us strive to model them as well as possible. I would be delighted to see a model perform better then the regime-switching model. I believe that we should be working to produce better and better models in all actuarial applications. What must not be allowed to happen is to let modelling itself fall into disrepute because we “cannot know” whether the past is an adequate representation of the future. The management of variable annuity contracts in the U.S., segregated fund contracts in Canada, and other equity-linked products around the world require models of the assets and liabilities. While we offer these contracts, we need these models.

Mr. Klein asks for the re-estimation of the parameters without the final 10 years to determine the quantile where the final 10-year accumulation factor falls. This is called an out-of-sample test.

The actual accumulation factor using month end data for the TSE 300 index, from January 1, 1989, to December 31, 1999, was 2.917; this falls at the 55 th percentile of the distribution using the pre1990 data. So, while Mr. Klein may not be happy to use this model for the next ten years, it would have provided a reasonable estimate for the accumulation factor had we used the model with the information available at December 31, 1989.

George Box said, “All models are wrong, but some are useful” (Box 1976). I agree completely with Mr. Klein that the regime-switching model is “wrong”. It may nevertheless be useful.

1 Authored by Stuart Klugman, Harry Panjer, Gordon Wilmot. Published by John Wiley bt Sons (New York) in 1998.

2 Klein, Gordon. “The Sensitivity of Cash-Flow Analysis to the Choice of Statistical Model for Interest-Rate Changes,” TSA XLV, pp. 79-124.

* Mary Hardy, A.S.A., F.I.A., is an Associate Professor of Actuarial Science in the Department of Statistics and Actuarial Science, University of Waterloo, Waterloo ON Canada N2L 3G1, e-mail: mrhardy@uwaterloo.ca.


BOX, G. 1976. “Science and Statistics.” Journal of the American Statistical Association. 71: 791-99.

HARDY, M. R. 2001. “Bayesian Risk Management for Equity Linked Insurance.” Scandinavian Actuarial Journal (forthcoming).

NAIK, V. 1993. “Option Valuation and Hedging Strategies with Jumps in the Volatility of Asset Returns” Journal of Finance. 48(5): 1969-1984.


* Gordon E. Klein, F.S.A, Lecturer, Dept. of Statistics and Actuarial Science, University of Iowa, Iowa City, IA 52242, e-mail: gklein@stat.uiowa.edu.

Copyright Society of Actuaries Jan 2002

Provided by ProQuest Information and Learning Company. All rights Reserved