Stochastic Proximity Embedding (SPE): A Self-organizing Principle for Modeling Proximity Data
We present stochastic proximity embedding (SPE), a self-organizing algorithm for producing meaningful underlying dimensions from proximity data. SPE attempts to generate low-dimensional Euclidean embeddings that best preserve the similarities between a set of related observations. The embedding is carried out using an iterative pairwise refinement strategy that attempts to preserve local geometry while maintaining a minimum separation between distant objects. Unlike previous approaches, our method can reveal the underlying geometry of the data without intensive nearest neighbor or shortest-path computations, and can reproduce the true geodesic distances of the data points in the low-dimensional embedding without requiring that these distances be estimated from the data sample. More importantly, the method scales linearly with the number of points, and can be applied to very large data sets that are intractable by conventional embedding procedures. SPE can be applied to any problem where nonlinearity complicates the use of conventional methods such as principal component analysis and multidimensional scaling, and where a sensible proximity measure can be defined. Because it seeks an embedding that is consistent with a set of upper and lower distance bounds, SPE can also be applied to an important class of distance geometry problems including conformational analysis, NMR structure determination, ligand docking etc. To that effect, the basic self-organizing algorithm is extended to preserve not only inter-atomic distance bounds but also chiral constraints that enforce the planarity of conjugated systems and correct chirality of stereocenters, as well as other types of constraints specific to the problem at hand. When applied to conformational analysis, we show that this approach produces excellent starting geometries that minimize to more diverse and energetically favorable conformations at a fraction of the time required by kindred techniques. Ongoing efforts in other applications in structural chemistry and biology will be discussed.
Dimitris Agrafiotis, Johnson & Johnson Pharmaceutical Research & Development and Indiana University