|
Using a Bayesian Distance Measure to Combine Rare Event Definitions
William DuMouchel
Lincoln Technologies
Associations among rare events in sparse databases are hard to measure
because of small counts. For example, the US FDA has collected a few
million adverse drug reaction reports, but there are about 5,000
distinct drug products and 9,000 distinct adverse event codes in the
system, so most drug-event code combinations are rare. The event coding
vocabulary, Medical Dictionary for Regulatory Activities (MedDRA) does
include a hierarchical grouping of its terms, but the groupings are not
optimal for detecting adverse drug reactions--for example, "Blood
pressure increased" and "Blood Pressure decreased" are grouped together,
and such groupings would tend to dilute signals of drug-event associations.
This talk describes an attempt to group the 9,000 MedDRA "Primary Terms"
according to their statistical associations with the 5,000 drugs in the
database. A Bayesian estimation of the strength of all drug-event
associations is used to define a distance measure between pairs of
events after which a standard agglomerative clustering algorithm is used
to combine events to create about 1,800 event groupings. The association
finding algorithm is then rerun using event groups instead of events,
and the results compared with the original analysis.
|