Gh-quality, near-synonymous relationships mapping to a total PubMed ID:http://www.ncbi.nlm.nih.gov/pubmed/20171653 of 214 previously undocumented headwords (provided in Dataset S4). For example, the noun phenotype was discovered to be a near-synonym of trait (PP.0.99), plus the adverb unhealthily was paired to destructively (PP.0.99), hazardously (PP.0.96), and badly (PP.0.90). Not all likely candidates of near-synonymy survived this conservative filtering, despite the fact that the apparent good quality on the relationships strongly correlated with their inferred posterior probabilities of accuracy (see Figure 3D). By way of example, many proposals suggested by additional than a single Turker have been not ultimately accepted (66 ), but these were generally robust examples of hypoor hypernymy (e.g., tribromide: anion). Correspondingly, 44 from the synonyms advisable by many Turkers created the final reduce, a lot more than double the 21 acceptance price for all those proposed a single time. To assess the good quality our found synonymous relationships, we examined their semantic similarity within a corpus of nearly 5 million English Wikipedia articles. Especially, we measured the semantic similarity among novel, true good, and accurate damaging synonym pairs by comparing the normalized details content material of their shared linguistic contexts to those obtained from a null background (see Supporting Information Text S1) [53]. We identified that random synonym pairs (accurate negatives) had an average semantic similarity of .62, though previously documented synonyms (accurate positives) had an average similarity score of four.62 (Figure 3E and 3F). Importantly, the novel synonym pairs validated by our pipeline had an typical semantic similarity score of 3.65, andSynonymy Matters for BiomedicinePLOS Computational Biology | www.ploscompbiol.orgSynonymy Matters for BiomedicineFigure three. Undocumented, general-English headwords and near-synonyms can be acquired experimentally. (A) The distribution over the inferred accuracies on the annotators validating harvested synonyms. (B) The correct positive price (blue) and false discovery price (red) on the validation process as a function in the posterior probability of annotation accuracy. Diagnostic statistics have been computed making use of recognized and random pairings. (C) The Receiver-Operator-Characteristic curve for the Naringin biological activity statistical model in the validation course of action, computed making use of identified and random pairings. (D) The distribution more than the posterior log-odds in favor of annotation accuracy for the novel synonym-headword pairings, annotated with exemplar pairings (rejected in red and accepted in blue). (E) The distributions more than semantic similarity scores for the correct adverse (red), true good (green), and novel synonym pairs (blue). (F) Bootstrapped (ten,000 re-samples) distributions more than the typical semantic similarity scores for each group of pairings, computed employing the data depicted in (E). doi:ten.1371/journal.pcbi.1003799.gmany pairs had scores that were within the top rated 1 of those obtained by true positive relationships (Figure 3E). This outcome strongly suggests that at the very least a fraction of undocumented but simply discoverable relationships are potentially of quite premium quality.The Vast Majority of Biomedical Synonymy Is UndocumentedHaving evaluated and validated the overall performance of our statistical methodology around the general-English dataset, we applied it towards the biomedical terminologies described inside the prior sections. The resulting estimates of undocumented synonymy had been incredibly higher (see Table S5 to get a summary of our statistical infere.