Big data's promise, constraints: Part 2
In Part 1 of this series we provided a loose definition of big data, described some of the ways that big data tools can be used in health, and identified the high degree of alignment of big data capabilities with quality and efficiency analytics as well as observational health research.
Big data tools also show great promise in managing the copious amounts of health data emanating from patients via social networking and home monitoring, as well as many areas that have a genomic data component. We also pointed out the irony that while quality and efficiency uses can frequently fall under HIPAA “treatment, payment, and operations,” patient identifiable data for research by virtue of being “designed to develop or contribute to generalizable knowledge,” must address much more strenuous constraints.
Some big data analytics and observational research can also be done on HIPAA de-identified data. But the traditional issues with de-identified data will be particular obstacles for other big data outcomes. Big data tools and data sets, for example, will increasingly bring re-identification of HIPAA de-identified data to the fore. When larger and broader publicly available data sets are joined with newly de-identified data, existing de-identification approaches become even less durable and identities become easier to re-establish.
De-identified data are also challenging for the kind of deep analytics that are needed to try to differentiate causality from correlation in observational data. It is indeed the patient’s identity that binds together data for patient-centric research and allows continuous aggregation and linking of data over time. Of course, fully de-identified data also does not support communicating with the patient when new findings or therapies are identified.
Observational data and interoperability
Analyzing clinical and claims data for quality and efficiency in accountable care is certainly a big driver for considering big data in healthcare right now. In this area, as in others, the flexibility that big data tools have to work with unstructured, as well as structured, data offers help in pursuing this very complex task.
It is import then to also consider improving the sharing and management of less well-structured data. Continuity of care infrastructure requires data from one EHR to be consumed and processed by another EHR via highly structured data messages like the HITSP C32. Big data approaches benefit from, but are not wholly constrained by, such highly structured data. Standards and technologies for indexing, marking-up, matching, and linking data though are important. For the time being these efforts will rely on ad hoc data warehouse accumulation techniques unless other standardization of infrastructure is advanced. Some of the mostly ignored Presidential Council of Advisors on Science and Technology recommendations for health IT focused on a language for structured data exchange, but many others focused on supporting technologies like indexing, mark-up and linking. These capabilities, standardized across the industry, would help establish infrastructure that supports big data aggregation and analysis and not just exchange.
Quality management versus quality reporting
There is an analogous tension on display in the area of heathcare quality. The approach that has been taken to advance national quality reporting standards in meaningful use and other CMS programs is for highly-structured and nationally consistent quality measures to be uniformly reported from care settings. This may be the right approach for advancing the measurement of national quality outcomes, but it does not take advantage of the massive amount of less well-structured electronic data that are already available in the healthcare infrastructure. It also does not fully recognize the role of clinical data systems in doing the quality management that can lead to better quality measurements.
Local quality management efforts are frequently based on using less nationally consistent data. These data have taken a back seat of late to the attention on highly structured national quality reporting. In quality reporting, and particularly for pay-for-performance purposes, it is hard to imagine not having nationally reportable metrics. However, with the advent of big data analytics, it may be prudent to ask whether a compound approach would be viable where more latitude is given to local big data quality management and then abstracted analytics for reporting purposes. A compound approach is possible where the reporting is about the change in local quality values rather than absolute measures. It remains to be seen whether the current push back on the many structured quality measures that are currently being applied will impact local advancement of quality management as meaningful use quality measurement certainly did.
Population and public health
The prevailing interpretation of HIPAA is that quality and efficiency-related population health activities carried out in an accountable care organization, or other clinical care setting, qualify as HIPAA “treatment, payment and operations” (unless they “contribute to generalizable knowledge”). However, there is still ambiguity about borderline population quality/research efforts to sponsor concern in some health attorneys’ eyes and make some lines blurry. Historically, many issues attributed to HIPAA have actually related to differing local interpretations and different state-by- state additions as much as HIPAA itself. Just as before, it is not clear where some of the fine lines will be drawn locally in respect to these borderline research activities for population health.
Public health activities, on the other hand, mostly receive authorization from state statutes and are enabled by, but not greatly controlled by, specific HIPAA language. Big Data concepts already have some public health traction in the context of “biosurveillance” or, as it is more commonly known now, “syndromic surveillance”. In these activities, data formatted for clinical care purposes are “mined” for signals. There are a variety of public health uses for these data including situational awareness during emergencies and the monitoring of general heath trends and status. Current state policies for these data, unlike structured disease reporting, frequently involve limited, de-identified data being moved to state health departments.
Unfortunately, as in other uses, having deeper clinical data that are linkable to the patient can be helpful as well for supporting the investigation of outbreaks, notification of exposures, and suppressing the spread of infectious diseases. Some of these data do not get passed out of clinical care environments. Current efforts to more broadly tap locally held data for investigation and other public health purposes have focused on either remote EHR access or highly structured distributed query. Full leverage of local big data tools, however, could represent a different path to local query that can accommodate less well-structured data and more generalized use.
Moving forward
The wide range of policy and utilization challenges for big data is at least encouraging for the many health uses they suggest. There are many more of these uses than can be mentioned here. As big data work progresses, consideration of the known utilization obstacles for analytics should become an increasing focus. Different policy considerations will need to be advanced to enable big data outcomes that match the full big data hype.
Big data particularly seems to press on the limits of how the currently polar identifiable vs. HIPAA de-identifiable data policies are implemented. As data sets grow bigger, there will be needs for more sophistication for other delineations. There will need to be levels of access where selective and strong authorization plays a larger role in enabling use of data that may not have direct identifiers, but maintains levels of patient linkage in order to achieve more complete big data outcomes.