Ontology alignment: A Content-Based Bayesian Approach
There are many legacy databases, and related stores of information that are maintained by distinct organizations, and there are other organizations that would like to be able to access and use those disparate sources. Among the examples of current interest are such things as emergency room records, of interest in tracking and interdicting illicit drugs, or social media public posts that indicate preparation and intention for a mass shooting incident. In most cases, this information is discovered too late to be useful. While agencies responsible for coordination are aware of the potential value of contemporaneous access to new data, the costs of establishing a connection are prohibitive. The problem grown even worse with the proliferation of “hash-tagging,” which permits new labels and ontological relations to spring up overnight. While research interest has waned, the need for powerful and inexpensive tools enabling prompt access to multiple sources has grown ever more pressing. This paper describes techniques for computing alignment matrix coefficients, which relate the fields or content of one database to those of another, using the Bayesian Ontology Alignment tool (BOA). Particular attention is given to formulas that have an easy-to-understand meaning when all cells of the data sources containing values from some small set. These formulas can be expressed in terms of probability estimates. The estimates themselves are given by a “black box” polytomous logistic regression model (PLRM), and thus can be easily generalized to the case of any arbitrary probability-generating model. The specific PLRM model used in this example is the BOXER Bayesian Extensible Online Regression model.
READ FULL TEXT