AWS leader talks about technologies needed to take precision medicine to the next level
One of the most significant challenges to the advancement of precision medicine has been the lack of an infrastructure to support translational bioinformatics, supporting organizations as they work to uncover unique datasets to find novel associations and signals.
By supporting greater interoperability and collaboration, data scientists, developers, clinicians and pharmaceutical partners have the opportunity to leverage machine learning to reduce the time it takes to move from insight to discovery, ultimately leading to the right patients receiving the right care, with the right therapeutic at the right time.
To get a better understanding of challenges surrounding precision medicine and its future, Healthcare IT News sat down with Dr. Taha Kass-Hout, director of machine learning at AWS.
Q: You've said that one of the most significant challenges to the advancement of precision medicine has been the lack of an infrastructure to support translational bioinformatics. Please explain this challenge in detail.
A: One of the challenges in developing and utilizing storage, analytics and interpretive methods is the sheer volume of biomedical data that needs to be transformed that often resides on multiple systems and in multiple formats. The future of healthcare is so vibrant and dynamic and there is an opportunity for cloud and big data to take on a larger role to help the industry address these areas.
For example, datasets used to perform tasks such as computational chemistry and molecular simulations that help de-risk, and advance molecules into development, contain millions of data points and require billions of calculations to produce an experimental output. In order to bring new therapeutics to market faster, scientists need to move targets through development faster and find more efficient ways to collaborate both inside and outside of their organizations.
Another challenge is that large volumes of data acquired by legacy research equipment, such as microscopes and spectrometers, is usually stored locally. This creates a barrier for securely archiving, processing and sharing with collaborating researchers globally. Improving access to data, securely and compliantly, while increasing usability is critical to maximizing the opportunities to leverage analytics and machine learning.
For instance, Dotmatics' cloud-based software provides simple, unified, real-time access to all research data in Dotmatics and third-party databases, coupled with integrated, scientifically aware informatics solutions for small molecule and biologics discovery that expedite laboratory workflows and capture experiments, entities, samples and test data so that in-house or multi-organizational research teams become more efficient.
Today we are seeing a rising wave of healthcare organizations moving to the cloud, which is enabling researchers to unite R&D data with information from across the value chain, while benefiting from compute and storage options that are more cost-effective than on-premises infrastructure.
For large datasets in the R&D phase, large-scale, cloud-based data transfer services can transfer hundreds of terabytes and millions of files at speeds up to 10 times faster than open-source tools. Storage gateways ensure experimental data is securely stored, archived and available to other permissioned collaborators. Uniting data in a data lake improves access and helps to eliminate silos.
Cloud-based hyperscale computing and machine learning enable organizations to collaborate across datasets, create and leverage global infrastructures to maintain data integrity, and more easily perform machine learning-based analyses to accelerate discoveries and de-risk candidates faster.
For example, six years ago Moderna started building databases and information-based activities to support all of their programs. Today, they are fully cloud-based, and their scientists don't go to the lab to pipette their messenger RNA and proteins. They go to their web portal, the Drug Design Studio that runs on the cloud.
Through the portal, scientists can access public and private libraries that contain all the messenger RNA that exists and the thousands of proteins they can produce. Then, they only need to press a button and the sequence goes to a fully automated, central lab where data is collected at every step.
Over the years, data from the portal and lab has helped Moderna improve their sequence design and production processes and improve the way their scientists gather feedback. In terms of research, all of Moderna's algorithms rely on computational power from the cloud to further their science.
Q: You contend that by supporting greater interoperability and collaboration, data scientists, developers, clinicians and pharmaceutical partners have the opportunity to leverage machine learning to reduce the time it takes to move from insight to discovery. Please elaborate on machine learning's role here in precision medicine.
A: For the last decade, organizations have focused on digitizing healthcare. In the next decade, making sense of all this data will provide the biggest opportunity to transform care. However, this transformation will primarily depend on data flowing where it needs to, at the right time, and supporting this process in a way that is secure and protects patients' health data.
It comes down to interoperability. It may not be the most exciting topic, but it's by far one of the most important, and one the industry needs to prioritize. By focusing on interoperability of information and systems today, we can ensure that we end up in a better place in 10 years than where we are now. And so, everything around interoperability – around security, around identity management, differential privacy – is likely to be part of this future.
Machine learning models trained to support healthcare and life sciences organizations can help automatically normalize, index and structure data. This approach has the potential to bring data together in a way that creates a more complete view of a patient's medical history, making it easier for providers to understand relationships in the data and compare this to the rest of the population, drive increased operational efficiency, and have the ability to use data to support better patient health outcomes.
For example, AstraZeneca has been experimenting with machine learning across all stages of research and development, and most recently in pathology to speed up the review of tissue samples. Labeling the data is a time-consuming step, especially in this case, where it can take many thousands of tissue-sample images to train an accurate model.
AstraZeneca uses a machine learning-powered, human-in-the-loop data-labeling and annotation service to automate some of the most tedious portions of this work, resulting in at least 50% less time spent cataloging samples.
It also helps analysts spot trends and anomalies in the health data and derive actionable insights to improve the quality of patient care, make predictions for medical events such as stroke or congestive heart failure, modernize care infrastructure, increase operational efficiency and scale specialist expertise.
Numerate, a discovery-stage pharmaceutical, uses machine learning technologies to more quickly and cost-effectively identify novel molecules that are most likely to progress through the research pipeline and become good candidates for new drug development.
The company recently used its cloud-based platform to rapidly discover and optimize ryanodine receptor 2 (RYR2) modulators, which are being advanced as new drugs to treat life-threatening cardiovascular diseases.
Ryanodine 2 is a difficult protein to target, but the cloud made that process easier for the company. Traditional methods could not have attacked the problem, as the complexity of the biology makes the testing laborious and slow, independent of the industry's low 0.1% screening hit rate for much simpler biology.
In Numerate's case, using the cloud enabled the company to effectively decouple the trial-and-error process from the laboratory and discover and optimize candidate drugs five times faster than the industry average.
Machine learning also is helping power the entire clinical development process. Biopharma researchers use machine learning to design the most productive trial protocols, study locations, recruitment and patient cohorts to enroll. Researchers not trained as programmers can use cloud-based machine learning services to build, train and deploy machine learning algorithms to help with pre-clinical studies, complex simulations and predictive workflow optimization.
Machine learning can also help accelerate the regulatory submission process, as the massive amounts of data generated during clinical trials can be captured and effectively shared to collaborate between investigators, contract research organizations (CROs) and sponsor organizations.
For example, the Intelligent Trial Planner (ITP) from Knowledgent, now part of Accenture, uses machine learning services to determine the feasibility of trial studies and forecast recruitment timelines. The ITP platform enables study design teams at pharma organizations to run prediction analysis in minutes, not weeks, allowing them to iterate faster and more frequently.
Powered by machine learning, real-time scenario planning helps to facilitate smarter trial planning by enabling researchers to determine the most optimal sites, countries and/or protocol combinations.
By eliminating poor performing sites, trial teams have the potential to reduce their trial cost by 20%. And by making data-driven decisions that are significantly more accurate, they can plan and execute clinical trials faster, leading to hundreds of thousands in cost savings for every month saved in a trial.
Additionally, purpose-built machine learning is supported by cost-effective cloud-based compute options. For example, high-performance computing (HPC) can quickly scale to accommodate large R&D datasets, orchestrating services and simplifying the use and management of HPC environments.
Data transformation tools can also help to simplify and accelerate data profiling, preparation and feature engineering, as well as enable reusable algorithms both for new model discovery and inference.
The healthcare and life sciences industry has come a long way in the last year. However, for progress and transformation to continue, interoperability needs to be prioritized.
Q: The ultimate goal of precision medicine is the right patients receiving the right care, with the right therapeutic, at the right time. What do healthcare provider organization CIOs and other health IT leaders need to be doing with machine learning and other technologies today to be moving toward this goal?
A: The first things IT leaders need to ask themselves is: 1) If they are not investing yet in machine learning, do they plan to this year? And 2) What are the largest blockers to machine learning in their teams?
Our philosophy is to make machine learning available to every data scientist and developer without the need to have a specific background in machine learning, and then have the ability to use machine learning at scale and with cost efficiencies.
Designing a personalized care pathway using therapeutics tuned for particular biomarkers relies on a combination of different data sources such as health records and genomics to deliver a more complete assessment of a patient's condition. By sequencing the genomes of entire populations, researchers can unlock answers to genetic diseases that historically haven't been possible in smaller studies and pave the way for a baseline understanding of wellness.
Population genomics can improve the prevention, diagnosis and treatment of a range of illnesses, including cancer and genetic diseases, and produce the information doctors and researchers need to arrive at a more complete picture of how an individual's genes influence their health.
Advanced analytics and machine learning capabilities can use an individual or entire population's medical history to better understand relationships in data and in turn deliver more personalized and curated treatment.
Second, healthcare and life sciences organizations need to be open to experimenting, learning about and embracing both cloud and technology – and many organizations across the industry are already doing this.
Leaders in precision medicine research such as UK Biobank, DNAnexus, Genomics England, Lifebit, Munich Lukemia Lab, Illumina, Fabric Genomics, CoFactor Genomics and Emedgene all leverage cloud and technology to speed genomic interpretation.
Third, supporting open collaboration and data sharing needs to be a business priority. The COVID-19 Open Research Dataset (CORD-19) created last year by a coalition of research groups provided open access to the plenary of available global COVID-19 research and data.
This was one of the primary factors that enabled the discovery, clinical trial and delivery of the mRNA-based COVID-19 vaccines in an unprecedented timeframe. Additionally, our Open Data Program makes more than 40 openly available genomics datasets accessible, providing the research community with a single documented source of truth.
Commercial solutions that have leveraged machine learning to enable large-scale genomic sequencing include organizations such as Munich Leukemia Lab, who has been able to use the Field Programmable Gate Array-based compute instances to greatly speed up the process of whole genome sequencing.
As a result, what used to take 20 hours of compute time can now be achieved in only three hours. Another example is Illumina, which is using cloud solutions to offer its customers a lower-cost, high-performance genomic analysis platform, which can help them speed their time to insights as well as discoveries.
Twitter: @SiwickiHealthIT
Email the writer: bsiwicki@himss.org
Healthcare IT News is a HIMSS Media publication.