Validation of SARs models – Q

Validation of SARs models – Q – Correspondence

Klaus L.E. Kaiser

From a practitioner’s point of view (but not having been part of the workshop), I feel compelled to comment on “Summary of a Workshop on Regulatory Acceptance of (Q)SARs for Human Health and Environmental Endpoints” by Jaworska et al. (2003).

There are a variety of quantitative structure-activity relationships [(Q)SARs] models available for a variety of purposes, and, as stated by Jaworska et al. (2003), predictive power is a critical issue in evaluating any model. Regrettably, the accompanying articles by Eriksson et al. (2003) and Cronin et al. (2003a, 2003b) fail to mention any of the recent publications on the application of probabilistic neural networks (PNNs) for the modeling of toxicity endpoints. Highly effective PNN models have been demonstrated for the fathead minnow (Kaiser and Niculescu 1999), the waterflea Daphnia magna (Kaiser and Niculescu 2001a), the ciliate Tetrahymena pyriformis (Niculescu et al. 2000), the Microtox bacterium Vibrio fischeri (Kaiser and Niculescu. In press), and estrogen receptor binding affinity (Kaiser and Niculescu 2001b). Indeed, Moore et al. (2003) have shown that fathead minnow PNN has superior performance in essentially all aspects when compared to the other methods. Other types of neural networks have similarly been shown to be robust and to provide optimal predictions (e.g., Burden and Winkler 1999). Furthermore, commercially available programs using PNN methodology have recently become available for the estimation of several toxicologic endpoints, such as fathead minnow 96-hr median lethal concentrations (L[C.sub.50]) (TerraBase, Inc. 2002), rat and mouse intravenous L[D.sub.50] (TerraBase, Inc. 2003a), and estrogen receptor binding affinity (TerraBase, Inc. 2003b).

Although representativity or domain of a model are good concepts in theory, they are difficult to define or use in practice. Moreover, the statistical descriptors of a model’s performance–such as goodness of fit, specificity, sensitivity, transparency, and similarity–are often misleading because the applied data set(s) for many (Q)SARs are narrow, skewed, or otherwise nonrepresentative of the chemical world existing in reality. In most cases, a model user cannot ascertain whether a particular model may or may not be used for a particular compound and end point to be estimated. Without tests of comparative performance, this conundrum exists for users of most models. Even for quite similar compounds, model outputs can vary by several orders of magnitude between both models and measured values. For example, predictions of octanol/water partition coefficients (a physical property) for a small set of quite similar compounds by commonly used models show a large divergence of values (Vrakas et al. 2003). Therefore, the (only) proof of model accuracy is in the testing of each model’s performance against a broad spectrum of measured data, which are not part of the training set of each model. In practice, this means that performance of a model should be the driving force for its acceptability in the regulatory world, not its statistics.

Regular scrutiny of performance has been commonplace in other areas. For example, the performance of Canadian environmental analytical laboratories is regularly checked with round robin testing. The predictive power of carcinogenicity and mutagenicity models has been evaluated in several rounds of testing, with the biological testing subsequent to the models’ predictions. There is a great need for such comparative testing of the usefulness of various existing (Q)SAR models. The valiant performance testing of several toxicity-prediction (Q)SARs models by Moore et al. (2003) shows some surprising results and further gives credence to this thought. Indeed, Jaworska et al. (2003) also stress the need for an independent organization to validate data and models irrespective of any model’s claims.

The author is the director of research and a principal of TerraBase, Inc.

Klaus L.E. Kaiser

TerraBase, Inc.

Hamilton, Ontario, Canada



Burden FR, Winkler DA. 1999. Robust QSAR models using Bayesian regularized neural networks. J Med Chem 42:3183-3187.

Cronin MT, Jaworska JS, Walker JD, Comber MH, Warts CB, Worth AP. 2003a. Use of QSARs in international decision-making frameworks to predict hearth effects of chemical substances, Environ Health Perspect 111:1391-1401.

Cronin MTD, Walker JO, Jaworska JS, Comber MH, Watts CO, Worth AP. 2003b. Use of QSARs in international decision-making frameworks to predict ecologic effects and environmental fate of chemical substances. Environ Health Perspect 111:1376-1390.

Eriksson L, Jaworska J, Worth AP, Cronin MTD, McDowell RM, Gramatica P. 2003, Methods for reliability and uncertainty assessment and for applicability evaluations of classification and regression-based QSARs. Environ Health Perspect 111:1361 1375.

Jaworska JS, Comber M, Auer C, Van Leeuwen CJ. 2003. Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. Environ Health Perspect 111:1358-1360.

Kaiser KLE, Niculescu SP. 1999. Using probabilistic neural networks to model the toxicity of chemicals to the fathead minnow (Pimephales promelas): a study based on 885 compounds, Chemosphere 38:3237-3245.

–. 2001a. Modeling the acute toxicity of chemicals to Daphnia magna: a probabilistic neural network approach. Environ Toxicol Chum 20:420-431.

–. 2001b. On the PNN modelling of estrogen receptor binding data for carboxylic acid esters and organochlorine compounds. Water Qual Res J Canada 36:619-630.

–. In press. Neural network modeling of Vibrio fischeri toxicity data with structural physico-chemical parameters and molecular indicator variables, In: QSARs for Predicting Ecological Effects of Chemicals (Walker JD, ed). Pensacola, FL:SETAC Press.

Moore DRJ, Breton RL, MacDonald DB. 2003. A comparison of model performance for six quantitative structure-activity relationship packages that predict acute toxicity to fish. Environ Technol Chum 22:1799-1809.

Niculescu SP, Kaiser KLE, Schultz TW. 2000. Modeling the toxicity of chemicals to Tetrahymena pyriformis using molecular fragment descriptors and probabilistic neural networks. Arch Environ Toxicol Chem 39:289-298.

TerraBase, Inc. 2002 TerraQSAR–FHM. Fish Toxicity Computation Program. Available: [accessed 7 January 2004].

–. 2003a. TerraQSAR–RMIV. Rat/Mouse intravenous LD50 Computation Program. Available: http://www. [accessed 7 January 2004].

–. 2003b. TerraQSAR–E2-RBA. Estrogen Receptor Binding Affinity Computation Program. Available: http:// [accessed 7 January 2004.

Vrakas D, Tsantili-Kakoulidou A, Hadjipavlou-Litina D. 2003. Exploring the consistency of logP estimation for substituted coumarins. OSAR Combin Sci 22:622-629.

COPYRIGHT 2004 National Institute of Environmental Health Sciences

COPYRIGHT 2004 Gale Group