Using a Bayesian Distance Measure to Combine Rare Event Definitions

William DuMouchel
Lincoln Technologies

Associations among rare events in sparse databases are hard to measure because of small counts. For example, the US FDA has collected a few million adverse drug reaction reports, but there are about 5,000 distinct drug products and 9,000 distinct adverse event codes in the system, so most drug-event code combinations are rare. The event coding vocabulary, Medical Dictionary for Regulatory Activities (MedDRA) does include a hierarchical grouping of its terms, but the groupings are not optimal for detecting adverse drug reactions--for example, "Blood pressure increased" and "Blood Pressure decreased" are grouped together, and such groupings would tend to dilute signals of drug-event associations.

This talk describes an attempt to group the 9,000 MedDRA "Primary Terms" according to their statistical associations with the 5,000 drugs in the database. A Bayesian estimation of the strength of all drug-event associations is used to define a distance measure between pairs of events after which a standard agglomerative clustering algorithm is used to combine events to create about 1,800 event groupings. The association finding algorithm is then rerun using event groups instead of events, and the results compared with the original analysis.

 


Last Edited: 2/14/05
DHTML Menus by http://www.milonic.com/