eSource Data Integration


17:51, September 12 2017


Terry Katz, Merck Animal Health, discusses controlled and secured integration of eSource data from various sources

Data are the elements that comprise the outcome of a clinical trial. A long time ago data was collected on pieces of paper, with scribbles and cross-out, supplemented with initials and dates. Pharmaceutical development tightened their processes and evolved into today’s galaxy with limited-access databases and remote entry. The industry generally retained the core quest for “truth” by monitoring against the stored hospital records. This time-honored tradition was finally shaken by TransCelerate with a subsequent reduction in source document verification, replaced by remote monitoring for oddities and inconsistencies.

Slowly, the industry is moving to the next horizon: electronically capturing the data directly into eCRFs (electronic case report forms) at the time of patient presentation. When data are captured first by the EDC (electronic data capture) system, it is classified as eSource. Advantages for eSource include, but are not limited to, enabling real-time remote monitoring, real-time adaptive trials at the patient level, and real-time pharmacovigilance actions.

But the eCRF is not alone. There are other sources of eSource data, and full use of each set can only be achieved when integrated.

eCRF Data Collection, Storage and Access

Common eCRF collection is via portals such as a PC or tablet, connected live to the internet. The Investigator enters the data locally, ideally patient-side, and the data is stored in a limited-access secured database on a server or a cloud. Each Investigator and Staff Member has a unique ID and password which permanently attributes the data collected to the person who enters, with date and time stamps. Roles and responsibilities are a priori defined by the clinical team, linked to each user’s unique ID, establishing a control system restricting access only to authorized pages. For blinded trials, this electronic gatekeeping ensures that persons identified with access to the clinical observations cannot peak at the randomization and treatment administration pages. 

A key item to remember is that eSource has no other reference source. The ability to alter an eSource value is extremely risky. To protect the data integrity, edit rights must be limited to an extent greater than a comparable EDC holding transcribed data. In the older systems, any questions on the electronic data from the clinical team or QA or government regulators could be confirmed by “going-back-to-the-originating-source” or documenting a clear self-evident-correction. With eSource, ideally only the Investigator who originated that data would have rights to update, with expressed justification.

Offline Modes and Upload to the Server

Offline electronic data collection is relatively new. This uses PC, tablets, or smartphones when there is limited to no internet access. While worldwide cell coverage has greatly expanded, many regions still have limited coverage. Even in the USA, depending on carrier, rural communities in less populated regions may not have Wi-Fi and/or reliable 3G/4G access. Preclinical research in laboratory blocks and veterinary trials in barns and fields may have no internet or reasonable cell reception. 

Portables using temporary local storage are a solution. Data collection processes and roles must parallel the web-based access, using identical eCRF forms, with date/time stamps and data attribution, in a submittable audit trail.

Here we face the first integration issue. Local storage needs to be uploaded to the global server.  Portable devices may need to be physically taken to an internet access point, to upload via Wi-Fi or hardwired connections. Need for identicalness of the eCRF becomes apparent when the data is pushed from the device to the central server which already holds equivalent data collected live via the internet. The local audit trail must also be pushed to the server and chronologically integrated.

A one-way push from device to server is, however, too simplistic. Data integrity would dictate that the device and server build in a feedback loop to ensure that (a) all data is uploaded, and (b) all data integrated on the server matches the information from the device. Ideally, the system only uploads “new” data (as opposed to the contents of the device) to reduce the demands on the transfer process.  

Once uploaded and confirmed, two major choices arise: (1) delete the data from the device leaving only the server as the eSource repository across all sites, or (2) back-populate the device with updated information from other devices that fall within that Investigator’s practice. In a preclinical study, multiple devices are often used by multiple staff members. Staffers on one offline device cannot see the data on the other offline devices until they are integrated with the central server.  If those integrated contents are important for the next clinical observation (say body weights to enable adjusted dosing), there is a value of back-populating to all of the local devices.

One major complexity of using multiple offline input tools is that two or more users could theoretically measure the same subject for the same timepoint. In an online mode, those fields would have already been populated and the second user would not re-measure, but offline devices do not talk to each other.  Integration here becomes a challenge. A key principle to follow is that the first population of the integrated server becomes the original eSource data. Any subsequent data collection by another offline device cannot replace or update the original data stored.  Like any other data conflict, the secondary data must be saved in the local device and audit trail, and resolved by Data Management/Clinical with the Investigator via the query resolution process.

Other data

While eSource is mostly discussed in terms of patient-side eCRF, there are other eSource workstreams.  In a hybrid data collection model, some data may instead be directly captured into the hospital patient record.  These EHR (electronic health records) are maintained by each physician practice, or by hospitals, or by a research network. Similarly, hematology, blood chemistry and urinalysis data could be generated by auto-analyzers and the eSource stored in the central laboratory server.

What is a data manager to do? The statisticians want all data together, in a common location and a common format. So would the FDA or any other government agency who requests the data to be submitted. Integration of this data in the past has been achieved by manually transcribing the paper lab printouts or having a site-coordinator copy the EHR results to the eCRF. Lab transfers and EHR extracts avoid the time and errors associated with transcription, but test the skills of the programmers to make disparate layouts into a common form. Today this integration process is less burdensome with the adoption of CDISC modules and nomenclature (especially using CDASH, SDTM, and SEND). But EHRs mostly follow HL7 (Health Level 7) Messaging, not CDISC, for their structure and nomenclature.  New tools like HL7 FHIR (Fast Healthcare Interoperability Resources) strive to ease integration ability as well as provide interoperability among various EHRs. Greater use of XML (Extensible Markup Language) is another avenue for transference of laboratory data in a form submittable to FDA. However, those lab XML transfers may not be in CDISC format, requiring re-formatting before integrating. Non-rectangular XML comprised of multiple relational databases would also need to be parsed before integration with the eCRF data.


Terry Katz

Head of Global Data Management and Statistics

Merck Animal Health



1) CDISC Home Page:

2) Clinical Trial Transformation Initiative (CTTI) Home Page:

3) European Medicines Agency (EMA): Reflection Paper on Expectations for Electronic Source Data and Data Transcribed to Electronic Data Collection Tools in Clinical Trials, 2010

4) Food and Drug Administration: A Guidance for Industry; Electronic Source Data in Clinical Investigations, 2013.  

5) Health Level Seven International homepage:

6) Katz T, Adaptive EDC at Field, Patient and Study Arm Levels, Arena Clinical Trials Yearbook 2015, p63-66

7) Mitchel J, Helfgott J, Haag T, et al, “eSource Records in Clinical Research”, Applied Clinical Trials, 16April2015

8) TransCelerate Home Page:

9) 2G / 3G / 4G coverage maps:


Post a comment

Comments may be moderated for spam, obscenities or defamation.