Big data: opportunity and challenge
Over the past two decades, America has experienced a societal revolution led by the Internet and the availability of “big data,” – defined by the Wall Street Journal as “the ability to collect, process and interpret massive amounts of information.”
Although big data has transformed modern culture, massive information sharing and analysis has yet to generate significant benefits within healthcare.
The promise of big data in healthcare is revolutionary. Use of big data will ease the transition to authentic data-driven healthcare, allowing healthcare professionals to improve the standard of care based on millions of cases, define needs for subpopulations, and identify and intervene for population groups at risk for poor outcomes. To date, few healthcare professionals would claim that the promise of big data has been fulfilled.
To understand the growth and usage of big data, it makes sense to look at an industry with advanced use of big data: consumer information technology.
Big data in consumer IT
Consumer IT uses big data daily. Accessing the Internet via Google or reaching out to friends via Facebook relies on the massive collection and transformation of information. How did consumer IT gain the ability to deliver massive value from massive data? Three trends supported this transformation.
1. Data availability: Information availability grew in the 1990s as the Internet offered a source of content with formalized protocols and broad access.
2. Metadata and grouping: Collecting the information wasn’t enough; applications had to understand the information. Information coding and grouping accelerated in the 2000s thanks to algorithms and systems that incorporated keyword matching, social grouping, natural language processing (NLP) and search algorithms. Simple text became marked up content, indexed and annotated for use.
3. Applications to leverage big data: Once information was available and annotated, companies like Google, Facebook and LinkedIn capitalized on a wealth of usable data and enabling technologies to meet consumers’ emerging information needs.
Big data in healthcare
If consumer IT accomplished so much over the last two decades, can healthcare derive similar high value from its big data? Given the high stakes and serious concerns, the answer in the short term is “maybe.” Industry commitment is shaped by multiple concerns, including data security, de-identification, patient versus societal benefit, stakeholder profits and political swings.
To balance these concerns and questions, healthcare professionals must understand their position in the trends of enabling big data usage.
1. Data availability: The goal is to make identified information available for care and de-identified information available for system improvement, but healthcare data rarely crosses institutional boundaries. Information hoarding, political conflict and lack of interoperability make even the most limited HIE initiatives a challenge.
The challenges of detailed data sharing are laid out in “Sharing Data Electronically” in the April 25, 2012, issue of the Journal of the American Medical Association (JAMA).
2. Metadata and grouping: Information without structure is useless. While an individual physician might be able to read narrative text within an EMR, analytics applications cannot utilize unstructured data. Most industries produce and store roughly 80 percent of their data in unstructured form, according to the “80 percent rule” articulated by Merrill Lynch in the 1990s. Although Google helps solve this problem for consumers, clinical data, as captured in narrative notes, is most often stored as simple text within siloed medical records. The large majority of analytics is built solely off claims or administrative data, creating major limitations, according to “Finding Pure and Simple Truths with Administrative Data,” an editorial in the Journal of the American Medical Association. That problem must be solved to support big data in healthcare.
3. Applications to leverage big data: When access and metadata are addressed, healthcare must still address the challenge of modernizing application infrastructure. In other industries, applications are built as components, allowing the best components to be reused, improved, and scale. In healthcare, applications tend to be built vertically, starting from source data and recreating extraction, data mining, and user interface techniques in custom solutions. Unfortunately, constantly creating end-to-end software from scratch limits innovation, undermines application power, increases expense, and it fails to allow for the low cost, rapidly built, powerful applications seen in other industries.
Despite healthcare’s limitations in dealing with data, the industry’s outlook on data is getting brighter.
The shift to data-driven healthcare
Healthcare has cause for optimism. Industry and government incentives, as well as HIE systems and technologies, are coming to grips with data access. At the same time, leading companies in terminology services, natural language processing (NLP), and data warehousing are addressing data structuring and grouping, providing the annotation needed to empower applications. Application vendors have begun the transition to modern development techniques, building on top of best-of-breed data components and providing cloud offerings. While the process isn’t fast, it’s moving forward.
The question is not whether big data will revolutionize healthcare – the question is, when?
The first applications in data-driven healthcare are likely to be for local quality and efficiency improvement.
Then, as providers make meaningful gains locally, they will join the drive toward regional population healthcare and testing and distribution of best practices. Finally, with broad availability of de-identified content, providers can make rapid strides in assessing interventions and outcomes to improve the standard of care globally.
The benefits of utilizing big data are great, but the challenges are significant. The industry must overcome a longstanding reluctance to re-engineer and improve processes. It must address information security needs for identified and de-identified data usage. And it must accept necessary business model changes and make peace with crossing institutional and political boundaries. Fortunately, most healthcare professionals entered the industry to benefit society. Their hearts are in the right place, and they share a common goal.
As the industry overcomes these barriers, the opportunity to improve health will grow as big as the term “big data” implies.
Dan Riskin, MD, is the CEO of Health Fidelity, provider of a commercial-grade, cloud-based natural language processing (NLP) service. Riskin is also a consulting assistant professor at Stanford University.