Study shows EMRs' potential to help determine genetic bases of diseases
Researchers at the Mayo Clinic have shown that by leveraging electronic medical records they were able to determine genetic variants that influence susceptibility to peripheral arterial disease (PAD), which is associated with "significant mortality and morbidity."
The research was published in a recent issue of the Journal of the American Medical Informatics Association (JAMIA) by a team of authors from the Mayo Clinic Divisions of Cardiovascular Diseases and Biomedical Informatics and Statistics.
Researchers concluded that EMR-based data, used across institutions in a structured way, "offer great potential for diverse research studies, including those related to understanding the genetic bases of common diseases."
The physicians used EMRs to confirm cases of PAD, which affects approximately eight million Americans 40 years old and older, and which includes 20 percent of the elderly (70+ years old) in the United States, and to identify phenocopies, i.e. mimics of atherosclerotic PAD.
With patient consent, and the approval of Mayo's Institutional Review Board, the research team accessed electronic medical records in a federated warehouse of patient data that Mayo Clinic has used since 1997 – a database of more than eight million patients. Using the Mayo Enterprise Data Trust (EDT), the researchers extracted relevant clinical variables on study participants that could confound the association of genetic susceptibility variants with PAD.
One of the study authors, Christopher G. Chute, MD, observed that the EDT "provides a scalable solution for clinical research, providing comparable and consistent data that can be employed in comparative effectiveness studies, outcomes research, or translational research as illustrated by this JAMIA paper."
In the study, PAD was defined as a resting/post-exercise ankle-brachial index (ABI) less than or equal to 0.9, a history of lower extremity revascularization, or having poorly compressible leg arteries. Controls were patients without evidence of PAD. Demographic data and laboratory values were extracted from EMRs. Medication use and smoking status were identified by natural language processing (NLP) of clinical notes.
"Although manual abstraction of medical records can provide high-quality data," the authors write, "for large studies such as genetic association studies, manual review of medical records can be prohibitively expensive and time-consuming. Our study demonstrates … several significant advantages over traditional approaches to genomic medicine research by simplifying logistics, reducing timelines, and overall costs through efficient data acquisition."
In their statistical analyses, the researchers used metrics long recognized in the NLP and information-retrieval community – precision, recall, and F-measure – to evaluate EMR-based algorithms compared with manual medical record review. Most cardiovascular risk factors and co-morbidities were captured from the EMRs with an accuracy rate higher than 90 percent. The researchers analyzed age, sex, BMI, race, geographical distribution, risk factors, co-morbidities, smoking status and medications.