You are viewing the site in preview mode
Skip to main content
| Target terms for this study and their concept identifiers in UMLS and SNOMED CT | BMJ Best Practice document |
|---|
| UMLS CUI | SNOMED CT identifier | VetCN dataset n-gram (frequency count) | PMSB dataset n-gram (frequency count) |
|---|
| C0018801 | 84,114,007 | heart_failure (1292) | heart_failure (4615) | Chronic congestive heart failure |
| C0004096 | 195,967,001 | asthma (1194) | asthma (8891) | Asthma in adults |
| C0014544 | 84,757,009 | epilepsy (1164) | epilepsy (3521) | Generalised seizure |
| C0017601 | 23,986,001 | glaucoma (1657) | glaucoma (1635) | Open-angle glaucoma |
| C1561643 | 709,044,004 | ckd (2698) | CKD (1550) | Chronic kidney disease |
| C0029408 | 396,275,006 | osteoarthritis (1765) | osteoarthritis (1991) | Osteoarthritis |
| C0002871 | 271,737,000 | anaemia (1414) | anaemia (1154) | Assessment of anaemia |
| C0003864 | 3,723,001 | arthritis (8276) | arthritis (1023) | Rheumatoid arthritis |
| C0011849 | 73,211,009 | diabetes (3660) | diabetes (12846) | Type 2 diabetes in adults |
| C0020538 | 38,341,003 | hypertension (1132) | hypertension (8365) | Essential hypertension |
| C0028754 | 414,916,001 | obesity (1763) | obesity (10030) | Obesity in adults |
- The last column contains the names and references of BMJ Best Practice documents used for validation in Step 5 (see details within the section Materials and methods). The first column contains the UMLS CUI mapped to a target term (n-gram) with the aid of MetaMap. The second column shows the SNOMED CT identifier mapped to the UMLS CUI with the aid of the UMLS API. The third column displays the target terms from the VetCN dataset, i.e. the n-grams with their frequency counts in the corpus appear within brackets. The fourth column shows the target terms from PMSB dataset with the same format of the third column. All target terms (i.e. n-grams) are identical for both datasets except one. The well-known medical condition “chronic kidney disease” with UMLS CUI = “C1561643” has the n-gram “CKD” (i.e. a short form with all the characters in upper case) in the PMSB dataset; while in VetCN dataset it has the n-gram “ckd”. The difference in these two target terms “CKD” and “ckd” happens as in Step 1, VetCN corpus is transformed to lower case while PMSB corpus is not