Challenges and Opportunities of Patient Reported Outcome Instruments in Clinical Trials: Perspectives from a PRO Scientist


17:23, June 22 2017


Katja Rüdell, Patient Reported Outcomes expert, offers insight into the usage of PROs in clinical trials and how they’ve evolved over time

I have been asked to contribute to Clinical Trials Arena to provide my perspective on the use of patient reported outcome (PRO) instruments past and present. As well as taking a retrospective outlook, I will look forward to see what the future could hold for PRO instruments in the clinical trial space and beyond.

Let us start with a short summary of what patient reported outcomes are for those who are not familiar. A PRO is a self-reported outcomes instrument that patients can complete as part of their clinical trial. It typically measures that which can only be measured by a patient report, such as a person’s feeling. A PRO instrument contains a concept of measurement (i.e. symptoms, sensation, functioning and quality of life) and an element of evaluation (a rating scale of some kind). Some address the latter as the ‘percept’ – the perceived or internally reconstructed intensity of the experience. In pharmaceutical clinical trials, PRO instruments are frequently used to evaluate how investigational products affect the patient’s subjective experience while in treatment.

Some advocate that these soft endpoints – ‘soft’ because perception is invariably fluid and somewhat depend on the current emotional state of the individual – are often used to supplement other harder outcome measurements that are captured within each clinical trial (i.e. biomarkers such as heart rate, blood pressure, tumor size, knee replacement and other common morbidity and mortality effects). To evaluate the efficacy of an investigational product, all such endpoints are traditionally analyzed using statistics that show the chances of a given product (for example an analgesic) affecting the concepts (e.g. pain) and endpoints (average mean pain ratings per treatment week). An evaluation is then made to see whether at the end of the trial the product provides a better so-called benefit risk ratio. This evaluation gauges whether or not the product has provided a benefit to the patient or caused harm. The benefit risk ratio is evaluated against other treatments to measure the likelihood of reimbursement and justify a given price.

The main commercial value of PRO use in the latter stages of the 1990s appeared to be mostly on differentiating new medications from old ones – allowing the industry to capture better outcomes for the patient while showing the value of innovation to payer and reimbursement agencies. Initially, PRO instruments were designed for differentiation purposes. Many of the legacy PRO instruments were developed by clinicians who listed only the patient concerns that they could address and had treatments for.

Understandably most clinicians have limited time in a consultation and are trained in medical school to listen mostly for concepts and sensations that are treatable. As a result, clinicians pay less attention to imprecise and unclearly defined wording or untreatable conditions, and ultimately, there is little they may offer to patients with a more unique case. The pattern analyses that physician use simplifies disease and symptoms to help quicker diagnosis and effective treatment for patients, but it may not be the comprehensive picture of the disease that any one patient may exhibit.

Furthermore, since physicians have developed the original and legacy PRO tools – rather than patients – the instruments generally capture the physician’s perspective. This raises issues and concerns about the development of such tools and the researcher bias in developing them. Let’s take, for instance, the measurement of depression. Old fashioned instruments contained idioms such as ‘feeling blue,’ which did not translate easily into other languages. Oftentimes, the language choice by the physician has been much more advanced or elaborate than that of the average patient (reading age across populations is only estimated around 10 years of age).

Additionally, the way of measuring the intensity or the percept varies. Every clinician and psychometrician has their own preferences of what maybe the most user-friendly, sensitive and responsive measure to use. Some favor the 1-10 Numerical Rating Scales (NRS), the 1-100 Verbal Anchoring Scales, or the 1-5 Likert agreement scales (where one and two generally represent the negative side, 4 and 5 the positive side, and three is a neutral point).

The invariable outcome of such variety and preferences resulted in the development of numerous different PRO instruments in clinical trials. For reimbursement bodies and regulatory agencies, such as the Food and Drug Administration (FDA) and the European Medicines Agency (EMA), this was and probably still is confusing. How can Company A evaluate their drug with instrument A and Company B with instrument B, and both claim success while the instruments differed? The agencies therefore produced regulatory guidance to standardize the use and development of PRO instruments and reduce the complexity. In 2005, the EMA brought out a CHMP (Committee for Medicinal Products for Human Use) guideline on the development and use of HRQL (Health-Related Quality of Life) measures in clinical trials. Meanwhile in 2006, the FDA produced a draft PRO guidance for labeling claims, which was finalized in 2009.

The FDA guidance was most influential for industry and had in my opinion three main repercussions. First and foremost, to cement the involvement of patients in the early development of PRO tools, if the manufacturer sought to describe the effects of the medication on the product label. A conceptual model outlines how sensations relate in other words depict the problems they face from a patient perspective. Ideally this should be established for all instruments before the instrument is fully constructed. So that meant somewhat a shift from clinician to the perspective of the patient was most important for the agencies. Secondly, the guidance also specified that clarity needed to be given in advance of the study and in discussion with the agency about the potential strength and length of a likely therapeutic effect. Furthermore, all effects should be pre-specified and based on evidence gathered in phase II of drug development.

Finally, strict guidelines were given around optimal implementation of the instruments in the trial. This concerned mostly the administration of the instruments at the clinical trial site as there were concerns around falsification of records and capturing the wrong time frame in which a therapeutic effect was observedby delay in data entry etc. Some of the scientists interpreted the guidance to mean that electronically measured PRO instruments, or ePROs, were certainly preferred as one could date stamp entry. Therefore, the received data whereby default more accurate in terms of the timeline when the effect was observed.

Since the release of the guidance and looking to the future, we have seen an enormous increase of ePRO instruments being employed in clinical trials, also enhanced by the success of smart phone technology. I would assume that this is a trend likely to continue and that we see better and more precise data that speaks exactly to when the treatment effect is observed. On the other hand, the condition of pre-specification of therapeutic effects on softer endpoints, although well intentioned, has actually become a hindrance to both the communication of patient reported effects, and the information of the patient about the softer effects of the medication. In oncology, for example, compounds are tested in small groups of patients (n=1-10) in phase I, with a slightly larger dose expansion cohort (up to 20-30 patients per cohort), then straight into a combined phase II and III registration trial with a small group of patients.

While efficacy and sample size are powered on survival and progression, it is much less useful from a manufacturer and scientific perspective to collect data on early phase I or II studies (too small numbers to make useful statistical inferences). Therefore it is very difficult to plan a likely treatment effect and consult with the agency about likely labeling within a condensed time frame of three years.

Precision medicine allows industry to innovate on a much smaller scale (targeted subpopulations) while payers and patient consumers still want PRO data to be available to them but there being not enough patients to examine data. I suspect we may see regulatory agencies, patients and industry converge to increase the dissemination of PRO effects in graphical displays, with less of a focus on statistical tests for significance, but rather descriptive information such as you can expect your symptoms to improve somewhat after two weeks or so. Thus while not reducing the scientific rigor it would help address the increasing concerns in data sciences around the arbitrariness of the p Value.

In conclusion, the field of patient reported outcomes remains exciting. The need for PROs is essential to explore the value of medication as well as reimbursement. Although I expect PRO data may need to become more simplified for the average consumer (I am hoping to see something similar to food nutritional labels), over time I still believe we should expect to see more and better PRO data, with a continued focus on the patient’s perspective and potentially a standardization of statistical approaches. These may include visual displays, which should be useful for clinicians, the industry, and most importantly, the patients.