Is synthetic data the key to healthcare clinical and business intelligence?

At HIMSS20, Robert Lieberthal, an economist at The MITRE Corporation, will offer a deep dive into synthetic data, showing how it can help health systems achieve cost efficiencies.

By Bill Siwicki

February 21, 2020

11:38 AM

The open source synthetic data source, Synthea. (Diagram courtesy of The MITRE Corporation.)

Update: HIMSS20 has been canceled due to the coronavirus. Read more here.

Cost data is crucial in order to enable a consumer revolution in healthcare. Synthetic data is a tool that potentially can help solve this problem. Financial outcomes can be incorporated into synthetic data.

But healthcare data is challenging to work with because it involves large, non-interoperable and sensitive files. A data set for 1 million patients easily can reach into the gigabytes (or more) especially when it involves a condition with many procedures, a large number of medications or substantial follow-up tests.

In addition, these files often are not common across systems, and often not even within systems.

Patients all may have had the experience of having the same lab work done by a doctor’s office and a hospital even when they are located in the same building. That is harmful to patients, wasteful and prevents speedy access to needed care.

Difficult to perform analyses

“And healthcare data is among the most sensitive in our society,” said Robert Lieberthal, principal, health economics at The MITRE Corporation. “Considering how personal health is, and the need to protect healthcare data under HIPAA and other laws, makes it difficult to perform the types of analyses used for predictive modeling and improved outcomes in other industries like transportation, retail and even housing.”

This problem is particularly important and applicable to financial data about healthcare. Total claims, claims amounts, negotiated rates and billing codes often are proprietary. Insurance claims data systems often are not interoperable with clinical – electronic health record – data, making financial information like prices difficult to obtain either ahead of time or at the point of care.

"Researchers, innovators, entrepreneurs and policy makers all are creating synthetic patient records to answer a number of important healthcare questions."

Robert Lieberthal, The MITRE Corporation

“Instead, patients, providers and even payers typically are unaware of the negotiated and paid cost of a particular service until well after the care is delivered,” Lieberthal explained.

“Financial data also tends to lag clinical data by a wide margin. Also, patients often are unwilling or unable to share the cost of their specific condition or their household’s cost of care; crowdsourcing and other methods that have been used to share information within patient groups are simply not an option for cost. As a result, patients may forgo care because of the reality, or perception, that they cannot afford their care.”

What are the challenges?

The challenges here involve the poor outcomes, high cost, negative patient experience and provider burden all too common in many parts of the healthcare system, Lieberthal said.

“We know there are high rates of mortality and morbidity – for example, ED visits and preventable readmissions – that are directly related to the characteristics of healthcare data and health IT,” he said. “This leads to high costs, meaning that we are paying more in many cases despite getting less. As a result, patients are perplexed and, in many cases, angry about their lack of ownership over their own data and need to bring their medical records with them from doctor to doctor.”

Providers are burnt out, too – they report a high and growing burden from time spent recording data in EHRs rather than interacting with their patients. That burnout is chasing qualified people out of healthcare at a time when the industry needs more doctors, nurses, and other health professionals, especially for older populations and in underserved areas.

How does synthetic data solve the problems?

So why is the use of synthetic data needed here? What does it do to address the problem and tackle the challenges?

“Synthetic data is a solution to many of the problems that plague our health IT system,” Lieberthal contended. “Synthetic generally consists of fully synthetic – fabricated – patient records and claims data. It is different than partially de-identified data, or data sets where variables have been censored or removed in order to restrict on protected health information variables.”

Synthetic data is not based on patient records, so it never can be linked back to a specific individual or their personal cost data. Instead, it is developed, calibrated and validated based on real world data to make it realistic, Lieberthal explained.

“Once the synthetic data has been created, it can be improved through shrinking the size of data or its complexity,” he continued. “Synthetic data also can be used to simulate the health IT system of the future, such as fully interoperable data or integrated clinical/EHR and claims/insurer data.”

Designed from scratch

Synthetic data addresses the problems of real-world healthcare data by being designed from scratch to solve problems rather than justify reimbursement or simply replace paper records, he added.

“Researchers, innovators, entrepreneurs and policy makers all are creating synthetic patient records to answer a number of important healthcare questions,” he said. “At MITRE, we are working on Synthea, an open source, fully synthetic set of EHR data. Synthea is based on realistic patient transitions for a wide range of conditions, and has been used to create synthetic cohorts of entire states and important disease states and populations – for example, cardiovascular disease, veterans populations and end stage renal disease.”

Using synthetic data in a sandbox environment allows developers, clinicians and others to test EHR systems and other health IT tools before deploying them to the bedside, leading to better solutions without the harm from alpha or beta testing in the field, he explained.

“The main components of synthetic data that make it useful are built in interoperability, integration of clinical and claims data, and the open source communities built up around synthetic data,” Lieberthal said. “The types of interoperable, complete patient records that exist in synthetic data sources rarely exist in the real world, at least not in the U.S., breaking the silos that exist between different provider groups.”

Finding the value of care

The connection between the clinical outcomes of a patient visit and costs rarely exists in practice, so being able to assess these trade-offs in synthetic data allow for measurement and enhancement of the value of care – cost divided by outcomes, he added.

“Finally, the open source community leads to a much wider range of developers who can work on this problem, leading to new ideas and a much larger pool of people who can tackle these difficult healthcare issues,” he said.

In many ways, synthetic data reflects George Box’s observation that “all models are wrong” while providing a “useful approximation [of] those found in the real world,” he quoted.

“Similarly, synthetic data is likely not a 100% accurate depiction of real-world outcomes like cost and clinical quality, but rather a useful approximation of these variables,” he explained. “In addition, synthetic data constantly is improving, and methods like validation and calibration will continue to make these data sources more realistic.”

An open source nature

In particular, the open source nature of many synthetic data sources, like Synthea, means that it is more open to scrutiny, analysis and improvement when compared to data generated from the practice of, and reimbursement for, healthcare services, he contended.

“In a way, synthetic data represents current health IT standards while also incorporating the best of what health IT could be,” Lieberthal stated. “For example, Synthea and other efforts typically use Fast Healthcare Interoperability Resources Specification (FHIR), a growing, acknowledged standard for interoperable records.”

That said, synthetic data often is represented using user-friendly interfaces such as graphical standards for representing care pathways, allowing non-developers access to synthetic data tools, he said.

“In other ways, synthetic data looks a lot like real-world data, and is used for development in a wide variety of settings – clinical quality measures and SyntheticMA, patient data for the state of Massachusetts,” he concluded.

“As a result, synthetic data is now so popular that there probably is no single characterization that fits all synthetic data. Instead, almost any situation where real-world healthcare data is used can and probably is being represented with synthetic data. That allows for the low-cost, low-burden testing environment that then can be validated using real-world data.”

Lieberthal will explain more during his HIMSS20 session, “Using Synthetic Data to Simulate Healthcare Costs.” It’s scheduled for Thursday, March 12, from 1:15-2 p.m. in Hall E, booth 8200.

Twitter: @SiwickiHealthIT
Email the writer: bill.siwicki@himssmedia.com
Healthcare IT News is a HIMSS Media publication.