Policy and implementation challenges to achieving big data outcomes (part 1)
"Big data" must be near the top of its hype cycle by now. As with other technologies, it may eventually deliver on a great deal of this hype, but the outcomes will probably come later than the current frenzy would suggest.
Part of the delay is that "new" technologies, such as big data, are frequently restrained by "old" policies and the "old" approaches of existing technologies. It takes time, and sometimes policy and utilization changes, to fully accommodate a new technology's potential. This two-part series of articles will point to key places in health policy and data use where current approaches may be impeding full big data outcomes.
Knowing big data when you see it
The term "big data" is being applied to many different things now, but exactly what is included is not always clear. One way that big data is defined is by the use of specific tools, such as the Hadoop framework, that are needed to practically deal with extremely large data stores. But while this is a convenient way to define things, it is also a somewhat circular definition. What is more, it does not really speak to the changes in approach and the differing utilization considerations that are involved in taking advantage of huge stores of data.
Specifically, big data tools facilitate pulling together great amounts of available data to support an objective whether those data were recorded specifically and narrowly for that objective or not (in health, sometimes they are called "secondary" data if they were recorded initially for clinical care purposes but then used for something else). Sometimes the data are simply a convenient surrogate for more specific data that are much harder to collect (Google searches as a surrogate for influenza reporting). Sometimes data are recorded in much greater detail than previously because the constraints of managing such great quantities of data are reduced (such as with physiologic monitoring data).
At times there also may be valuable "signals" in the data we did not collect before. We simply may not have known that the "junk" data were not really "junk" (such as the majority of DNA, not used for direct protein synthesis, but which has been recently discovered to modulate gene expression).
Some of these data may also have been recorded in less than ideal ways from a data analysis standpoint. They may be very raw, may be in unstructured form (such as narrative text), or may be in any of multiple different electronic formats (video and audio files, document images, etc.). In health, these format considerations are critical because there are so many ways that information is recorded in clinical care (imaging devices, sensors, software systems…) and because the health industry continues to struggle to get even a fraction of its information into standardized formats.
All of these variations involve aspects of big data that are about using increasing amounts of data you can get instead of getting the exact data that you think you need. For our purposes, we will call new ways of looking at the data that you can get "observational."
Observational health research
Big data tools offer great promise for new approaches to health research. As health care tries to broaden the scientific basis for treatments and as it begins to engage comparative effectiveness work, there are needs to use data that have been accumulated for clinical care, for communications, and for other purposes. There are also needs to reuse data that have been previously accumulated in other research and use them for follow-up, further extension, or new hypothesis testing. Big data tools offer the opportunity to add to the "traditional" research analysis of limited sets of specifically-extracted and highly-specified data with big data analysis of huge amounts of less well-structured, less well-specified electronic clinical care, "social media," and other data.
Of course, where possible, it is still desirable to have well-structured, highly-specified data, but that route to research alone does not seem to address the size of the research problem in front of us. What is more, the general scientific expenditure is threatened by funding reductions like much of the rest of the national discretionary budget.
Big data has many potential roles in research, but a major one is this "mining" of large, less well-structured data that exist as a byproduct of clinical care and other engaged electronic systems. In general there are two policy approaches to using health data that originates with an identifiable patient or person. One is to de-identify the data and the other is to obtain the individual's consent that their data can be used. We will discuss big data issues with de-identified data later. HIPAA "consent" presents issues for some big data uses.
HIPAA presents problems for "secondary" and "future" research done on existing data even after the addition of the long awaited HIPAA Omnibus rule. The new Omnibus rule does progress on the use of "compound authorizations" to enable, among other things, two or more research-related authorizations (such as for two different research studies) to be consented at the same time. However, the Omnibus rule still requires that for future investigations on data to be allowed, they must be described in "sufficient detail" in the patient consent.
The conflict here is that while big data approaches offer great opportunity for additional queries and subsequent analysis of large data sets for unexpected findings and secondary conclusions, HIPAA requires that patients be re-consented if the new investigations are of a different nature than the original work.
The ironic part is that these HIPAA consent constraints relate to work defined as being "designed to develop or contribute to generalizable knowledge" vs. for locally-used treatment, payment and operations uses. Many people struggle with why such a laudable population health goal actually induces greater constraints on how data can be used.
In the next article we will consider some of the problems with HIPAA de-identified data as well as other population health big data issues.
[See also: Big data: opportunity and challenge]