IBM, Mayo Clinic launch open source consortium to extract data from EMRs

By Molly Merrill
09:40 AM

Biomedical informatics researchers at the Mayo Clinic and IBM have launched a Web site for the newly founded Open Health Natural Language Processing (NLP) Consortium, which will facilitate an open-source space to promote past and current development efforts, including participation in information extraction from electronic medical records.

The site will allow the approximately 2,000 researchers and developers working on clinical language systems worldwide to contribute code and further develop the systems.

"The recently passed American Recovery and Reinvestment Act promises to accelerate the adoption of electronic medical records," said Dan Pelino, IBM's general manager of Global Healthcare & Life. "Because the success of such reform rides on delivering interconnected and intelligent information to healthcare professionals everywhere, Mayo and IBM are tapping into the collaborative power of the open-source community to speed the development of Natural Language Processing (NLP). Adoption of this technology will provide physicians with insights into each patient's condition, allowing them to electronically retrieve the exact knowledge they seek from patient health records rather than reading through every record provided, as they must do today."  

The Mayo Clinic and IBM also released their clinical NLP technologies into the public domain.

NLP is a relatively new and specialized area within computer science dealing with computational methods for understanding human language. In medicine, clinical NLP systems process the vast repositories of text generated by patient-clinician interactions. Such systems categorize and structure it according to standard nomenclature – in this case focusing on terms used in a range of medical specialties – that will ultimately speed data searches for both diagnoses and medical research.

NLP platforms, or "pipelines," aid indexing and searching electronic medical records within institutions to quickly find similar cases or conditions, so physicians are not reliant solely on their own clinical experience in analyzing a problem. Researchers may also use these tools to aid retrospective epidemiological studies or do groundwork for new clinical trials.

"We are inviting our international colleagues to help continue development of these valuable tools," said Christopher Chute, MD, a bioinformatics expert and senior consultant on the project at the Mayo Clinic. "By making it an open-source initiative, we hope to enable wide use of these NLP tools so medical advancements can happen faster and more efficiently."

The Mayo Clinic and IBM have developed a system for extracting information from more than 25 million free-text clinical notes based on IBM's open-source Unstructured Information Management Architecture. As part of the system, developers build strings of "annotators" that become a pipeline, allowing physicians to mine the text for references of specific conditions, drugs, diseases, signs and symptoms, anatomical areas or organs or treatment procedures.

IBM and the Mayo Clinic have also developed a system to extract cancer disease characteristics from unstructured pathology reports to facilitate "consistent retrieval and transmission of cancer cases." The system extracts tumor characteristics, lymph node status and metastatic disease information, enabling the automatic computation of cancer stage.

"There is a treasure trove of historical unstructured data that provides essential information for the study of disease progression, treatment effectiveness and long-term outcome which NLP systems make available to clinicians and researchers," said Anni Coden, PhD, IBM's NLP principal on the project. "Such data can provide guidance for prospective studies and furthermore facilitate the integration of data from multi-modal data sources."

The two clinical text solutions released by the Mayo Clinic and IBM aim at processing two specific types of notes. Clinical notes describe patient-physician encounters, while pathology reports center around tissue findings. Physicians will be able to research past records to examine earlier cases of rare conditions, thereby "conferring" with their colleagues across time to aid diagnosis and treatment decisions.

"Large-scale information extraction from the clinical narrative is a vital component in advancing translational research and patient care," said Guergana Savova, PhD, medical informatics specialist and Mayo's NLP lead on the project. "It 'unlocks' the clinical textual data that resides in huge repositories. Such technology would allow for large-scale data aggregation, analyses and usage – just imagine the power of data from millions of patients."
 

Want to get more stories like this one? Get daily news updates from Healthcare IT News.
Your subscription has been saved.
Something went wrong. Please try again.