Designer Drugs, Medicinal Chemistry, And OptimizationMay 22, 1999
X-ray crystallographic structure of the enzyme elastase (medium gray) binding a small inhibitor molecule (light gray), shown on a black background. The atoms are rendered as polyhedra about as big as their hard sphere surfaces.
"The 'magic bullet' of medicinal chemistry is really a machine gun," warned Gordon Crippen, a professor of pharmaceutical chemistry at the University of Michigan, in an invited address at the SIAM Annual Meeting in Toronto. Administering a drug, he explained, is like spraying a magazine of bullets throughout the body. The intended target is a particular protein, a complex organic molecule whose behavior is to be altered for the better when a drug molecule binds to a receptor site on the target.
Some of the medicinal bullets will glance harmlessly off other proteins. But others could strike unintended victims, altering the behavior of innocent proteins and producing undesirable, perhaps even fatal, side effects. Better designed bullets might be simultaneously more effective against the targets at which they are aimed and less likely to cause damage to bystanders.
The challenge of receptor-based drug design is to deduce models of protein receptor sites---the shapes of the targets---in order to predict their vulnerability to new "bullets," i.e., their propensity to bind alternative drug molecules. Given experimental data about the binding of a few, relatively small drug molecules to one large, complex protein molecule, what models of the protein receptor sites can be deduced to guide the development of new drugs?
Crippen uses integer programming techniques to produce such receptor site models, with estimates of their accuracy. The essence of the model is a set of parameters that explain observed binding activity. "In terms of pure science," he said, "this is a remarkable challenge in inductive reasoning: Starting with geometrically flexible, small molecules and an interval associated with each, derive the simplest three-dimensional site model, or all possible models, or the model that is most justifiable in some sense."
Even without that intellectual challenge, the stakes are high enough to make computer-aided drug design a game worth playing. The U.S. pharmaceutical industry alone spends more than $7 billion annually in research. Guiding a single new medicine from discovery to physicians' offices can take as long as 15 years and cost $125 million. Computational drug design can shorten development cycles and improve the effectiveness of the end products.
Geometry and Energetics
Proteins are three-dimensional structures, but complex geometry alone does not explain the extraordinary expense and effort needed to develop new drugs. In spite of suggestive key-and-lock analogies for the binding of drugs to receptor sites, there is much more at work than just geometry.
"Given the atoms and bonds that constitute the chemical structure of a small molecule," Crippen said, "it costs about one hour of workstation CPU time to calculate a broad range of the possible three-dimensional structures which that molecule might assume. And any one of those might be the bound conformation. The three-dimensional structure of about 6000 protein molecules has been determined experimentally at a cost of roughly one person-year each, but most receptor structures remain unknown."
Information about drug-receptor energetics augments the geometric picture. Some of these energy calculations are relatively easy; others are quite difficult, requiring upward of 100 super-computer CPU hours.
Without models for guidance, drug design requires either brute force or chemical intuition. In principle, brute force could couple a combinatorial approach to drug synthesis with mass screening, although, Crippen cautioned, "The number of possible drug molecules is estimated to exceed the number of atoms in the universe!" On the other hand, "chemical intuition---an inspired guess about what compound to synthesize next---produces fewer than ten compounds per chemist per week and perhaps one successful drug per chemist per career.
"For every paper in the Journal of Medicinal Chemistry where receptor structure is known, there are two papers where the (binding) mechanism is unknown and three where the receptor is identified but has unknown structure."
Standard Methods for Drug Design
When receptor structure is not known, the simplest approach to drug design is to assume that homologous proteins---those with some obvious similarities in portions of their amino acid sequences---will have analogous similarities in their binding modes. The extent to which homology can be exploited is limited by the discontinuity in the map from structure to binding activity.
An alternative approach, comparative molecular field analysis (CoMFA), studies atomic forces independent of the protein. Members of a family of drug molecules are superimposed on one another in some optimal way, although the definition of "optimal" varies with the setting.
The molecular field of each drug molecule is described by the energy of the molecule's interactions with several different imaginary probe particles, say an electron and a carbon atom, placed at predetermined points in a spatial grid. That set of interaction values is fit via least squares to a vector of parameters that characterize the binding activity of the drug molecule. CoMFA results depend on the arrangement of the initial superposition, the probes, the density of the grid, and its orientation relative to the molecules.
Another approach, quantitative structure-activity relationships (QSAR), seeks a measurable link between biological activity and some subset of a broad range of chemical behavior parameters, such as rate constants in certain reactions. Typically, QSAR is applied to a family of homologous compounds, each of which is assumed to bind to the target protein in essentially the same way. Some optimal, relatively small subset of the chemical properties is identified, and biological activity is fit via least squares to those parameters. Crippen asks the hard question about that fit: Is it statistically significant?
Such reservations notwithstanding, QSAR has had some notable successes. A QSAR analysis of more than 70 related compounds, for example, guided the design of the antibacterial agent norfloxacin, which is used to treat urinary tract infections. The analysis identified key chemical properties and the locations of important components of the drug molecule. Widely distributed herbicides and fungicides have been developed in a similar way.
Receptor Sites from Binding Data
Instead of assumptions about alignment, binding modes, or receptor structure, Crippen begins with the chemical structures of several drug molecules and data about their binding affinities for a particular receptor site on a target molecule. He uses this so-called training set to predict binding affinities of various groups of atoms, or binding sites, in the target molecule.
The atoms in the drug molecules are characterized by three parameters that reflect the contribution of each atom to three specific properties of the molecule: charge distribution, molar refractivity (a property correlated with molecular weight and polarizability), and hydrophobicity (preference for an aqueous environment over a nonpolar---or "oily"---one).
The prospective binding sites are associated with regions on the target molecule. The hydrophobicity, charge, and polarizability of the atoms in each region then determine that region's three interaction parameters. Crippen defines a binding mode as a "global optimum of the internal energy of the molecule plus the estimated free energy of the interaction between the drug molecule and the receptor site."
The value of that optimum energy is the calculated binding affinity. A central tenet of Crippen's approach is the requirement that the calculated binding affinity fall within the error bars of the measured binding affinity for each of the drug compounds in the training set.
Loosely, the model that results from the application of a given training set to a receptor molecule is a set of binding site geometries. The model is generally not unique, particularly if there are just a few molecules in the training set or if the experimental binding data are imprecise.
The Optimization Problem
Crippen is left with a mixed-integer programming problem whose objective function seeks a binding site geometry that is minimally restrictive. The continuous control variables are the energy parameters for each prospective binding site and the upper and lower bounds on the distances between those sites. Boolean variables are used to determine whether a binding mode is energetically and geometrically allowable.
The continuous constraints restrict the energy and the geometry of allowable binding modes. The calculated binding energies must lie within the range of the experimental measurements, and the conformation of the target molecule must be consistent with the geometry of its binding regions. The discrete constraints require that every drug molecule have at least one allowable binding mode and that each binding mode be either allowed or rejected for a specific geometric incompatibility. (A binding mode is geometrically incompatible if, for example, the shortest distance between two regions in the protein model is 10 Angstroms while the longest distance between atoms of the drug molecule that bind in the two regions is only 9 Angstroms.)
The core of his solution approach is branch and bound followed by a cutting plane. The branch-and-bound search finds a set of Boolean parameter values that satisfy the problem along with the continuous variables. The objective function tends to give the least restrictive site geometry it can. Once a solution has been found, a cutting plane excludes that site geometry from further consideration so that new ones can be discovered.
A typical problem with realistic data might have six molecules in the training set, 18 continuous variables describing the geometry and energetics of the site model, and 500 Boolean variables representing the geometric compatibility between possible binding modes of the molecules and the site model. Since a few of the Boolean variables are much more important than the others, Crippen carefully orders the levels of the branch-and-bound search in such a way that solutions can be found by exploring fewer than 10^4 branches (rather than 2^500). Typical run times are about an hour on a Silicon Graphics R5000 workstation.
Crippen's method has proved to be quite effective on the standard test cases. Nonetheless, he modestly described it as being "as unsuccessful as anyone else's method," although he does admit that it obtains good predictions of structure from very small training sets. Clearly, his use of modern optimization tools and models has moved the process of receptor-based drug design out of the realms of statistically uncertain least squares, heavy dependence on intuition, and restrictive assumptions about molecular structure and alignment.
The Contribution of Computational Chemistry
As appealing as it may seem to applied mathematicians, the notion of start-to-finish drug design at a computer terminal is unrealistic. "Older, more standard methods, such as QSAR or CoMFA, for which there are convenient commercial software packages, are used daily in all the big companies," Crippen observed. "But there are remarkably few success stories that computational chemists can cite."
The difficulty, he pointed out, is that no existing computer program can read in the experimental evidence and write out the chemical structure of a drug that will pass all the biological tests and then go on to become a great commercial and medical success. "Instead, there is a general 'feeling' that the odds of producing a successful drug are raised by employing computational chemistry."
Given the enormous cost of drug development, no one will perform what Crippen sees as the obvious test: "Build two matched companies, each with one hundred equally good synthetic chemists, biochemists, toxicologists, etc., but add ten computational chemists to one of them and then analyze their output after ten years!"
He described his favorite analogy as "determining the parents (intellectual creators) of a child (drug). The mother (the synthetic organic chemist) is obvious because she (he/it) was present at the birth (synthesis of the drug). The father might have been anyone passing by (random inspiration) or some close friend (the computational chemist down the hall), but that contribution is recognized only if the mother decides to make a public statement about it."
Questions of paternity aside, it is clear that computational chemistry and the underlying tools of modeling, optimization, and scientific computing are key ingredients, both intellectually and commercially, in the most effective prescription yet for the development of new medicines.
Paul Davis is a professor of mathematical sciences at Worcester Polytechnic Institute.