Abstract: Three-dimensional models, or pharmacophores, describing Euclidean constraints on the location on small molecules of functional groups (like hydrophobic groups, hydrogen acceptors and donors, etc.), are often used in drug design to describe the medicinal activity of potential drugs (or ligands'). This medicinal activity is produced by interaction of the functional groups on the ligand with a binding site on a target protein. In identifying structure-activity relations of this kind there are three principal issues: (1) It is often dicult to \align" the ligands in order to identify common structural properties that may be responsible for activity; (2) Ligands in solution can adopt dierent shapes (orconformations') arising from torsional rotations about bonds. The 3-Dmolecular substructure is typically sought on one or more low-energy conformers; and (3) Pharmacophore models must, ideally, predict medicinalactivity on some quantitative scale. It has been shown that the logicalrepresentation adopted by Inductive Logic Programming (ILP) naturallyresolves many of the diculties associated with the alignment and multiconformation issues. However, the predictions of models constructed byILP have hitherto only been nominal, predicting medicinal activity tobe present or absent. In this paper, we investigate the construction oftwo kinds of quantitative pharmacophoric models with ILP: (a) Modelsthat predict the probability that a ligand is \active"; and (b) Modelsthat predict the actual medicinal activity of a ligand. Quantitative predictionsare obtained by the utilising the following statistical procedures as background knowledge: logistic regression and naive Bayes, for probability prediction; linear and kernel regression, for activity prediction. The multi-conformation issue and, more generally, the relational representation used by ILP results in some special diculties in the use of any statistical procedure. We present the principal issues and some solutions. Specically, using data on the inhibition of the protease Thermolysin, we demonstrate that it is possible for an ILP program to construct good quantitative structure-activity models. We also comment on the relationship of this work to other recent developments in statistical relational learning.
