The utilization of machine learning in the design, conduct and interpretation of clinical trials has been eagerly anticipated. Machine learning has the potential to increase the efficiency of almost every aspect of medical research from identifying appropriate patients to optimizing strategies to send study supplies to clinical sites. The advent of electronic tools to medical research parallels the expectation of precision medicine for patients, improving benefit while reducing risk.

This evolution to machine learning is already occurring. Recently two devices coupled with artificial intelligence were approved in the U.S. One of the devices is designed to assist physicians in detecting early retinopathy and another for treating strokes. The pace of incorporating electronic data capture and machine learning into clinical trials, however, has been relatively slow despite the increasing need for more efficient processes.

Machine Learning will become Essential for Selecting Patients

Rapid and efficient patient selection for clinical trials requires access and interrogation of large databases. As we design more specific treatments, sifting through increasingly complex variables will require machine learning. Only with efficient electronic searches will we be able to enroll patients with the desired baseline characteristics.

The strive for precision medicine will change the way we select patients even in diseases with large patient populations that are now considered homogenous. While this trend in oncology is most obvious, with patients being identified by the genetic characteristics of their tumors, the future promises similar results for many chronic diseases. Illustrative of this progression, it recently was reported that a team of researchers in Finland and Sweden stratified diabetes patients into five subgroups with different rates of disease progression and risk of diabetic complications. This analysis required a database containing approximately 15,000 newly diagnosed patients.

Collecting Data, Predicting Trial Results, and Generating Conclusions will be Enhanced by Machine Learning

Clinical trials are now moving into an era of continuous electronic data collection utilizing devices and techniques, such as wearable sensors, mobile phones, electronic journals and digital imaging. Machine learning will be required to efficiently utilize these data. Ongoing data collection and analysis while the trial is being conducted will allow continuous modeling of projected trial results modified to adjust to the patient population actually being enrolled. This ongoing evaluation will allow safety monitoring to be enhanced and corrective actions taken to increase the trial’s chance of success.

Machine learning is essential as our knowledge of genetic and biomarker variables exponentially increase the amount of data that must be interpreted. Our analytic abilities must continuously improve to draw appropriate conclusions concerning the association of increasingly complex baseline characteristics, and the safety and efficacy of the treatment.

The ultimate goal of any program is to combine and analyze all of the information gathered during the development process in order to obtain conclusions on the benefit risk profile. For products already on the market, new clinical trial information must be readily combined with real world data. Only sophisticated data collection and machine learning will increase the speed and comprehensiveness of these analyses.

Limited Progress Made Fully Incorporating Machine Learning into Clinical Development. So Far…

While the potential for machine learning in medical research has been envisioned for many years, our progress towards making it reality has been tediously slow. This slow pace is in spite of significant efforts by regulators, academic institutions, and the pharmaceutical industry.  Currently, many trials incorporate some aspects of electronic data capture and machine learning, but very few trials utilize these technologies fully. Many of the efforts are pilot programs that are not incorporated into common practice.

This gradual and incremental rate of incorporating machine learning into clinical research is largely due to a culture of risk avoidance. This culture is understandable and reasonable given the potential for harm to patients if methods with poor validation are employed. Patient privacy and safety concerns must always be prioritized over speed and efficiency. Regulators, IRBs, investigators and patient advocacy groups must all be convinced that any new process protects the patients enrolled in a clinical trial.

Along with the safety and privacy of patients data integrity must be ensured. ICH and GCP guidelines must be followed for any trials utilizing electronic data capture and machine learning analyses. Training is another consideration for any improved digital system. Investigators and patients must be able to understand how to use the electronic tools being incorporated into trials and document that understanding.

The FDA is Currently Exploring Ways to Accelerate Change

Global regulatory agencies are increasingly aware of the benefits of machine learning and the FDA, for example, is adapting to this approach. According to FDA, the 21rst Century Care Act legislation is “…designed to help accelerate medical product development and bring new innovations and advances to patients who need them faster and more efficiently.”

In April of this year, Dr. Scott Gottlieb, commissioner of the FDA, reviewed new FDA initiatives designed to bring digital health care into drug development. “As we begin for the first time to address the role of digital health in drug development, we’ll work to ensure our regulatory approach reflects the novel nature of these products and encourages and supports their innovation. We must recognize the potential of digital health as a new tool to improve the safety and effectiveness of drug delivery.”

Leadership Required to Usher in Era of Machine Learning in Clinical Trials

Given that the current efforts of stakeholders in clinical research have yet to achieve the goal of maximizing the use of machine learning, what can be done to accelerate the process? How can we move from our current state where much of our data is still captured on paper and analyzed inefficiently?

Leadership and investment are required to change the current status. Given the risks involved and the tremendous investment in infrastructure that is needed, only a focused and coordinated process will be successful. Academic, regulatory and R&D development leaders from industry will have to step forward and insist that our current processes are no longer viable. Information gleaned from pilot efforts must be incorporated into standard practice. A coordinated road map with fixed goals is required for change.

Other stakeholders must also do their part. Patient advocacy groups need to insist that the most modern methodologies are utilized to find and treat their patients. Patients should select trials designed to improve their trial experience with improved data collection processes. Investigators participating in clinical research must be willing to accept new technology when they join a trial and commit to the time and effort to train themselves and their staff in new techniques. IRBs need to insist that the most up to date methodologies are incorporated into trials to allow adequate benefit risk assessments.

A complete effort from all of the stakeholders in clinical trials is required if we wish to accelerate our progression into a modern age of medical research. This effort will not be an easy one, but the gains achieved from modernizing our conduct of clinical trials is too great of a prize to delay.


Jim Stolzenbach


Jim Stolzenbach Consulting LLC