Real world data (RWD) are playing an increasing role in health care decisions. Combining various datasets provides complementary insights and may facilitate key decision making. Data linkage and integration processes are nonetheless associated to technical, ethical and regulatory hurdles that must be acknowledged. Linkage between clinical and claims data deserves special attention and planning anticipation, as illustrated with the data linkage operated between clinical data (Clinical Studies and Patient Registries) and the système national des données de santé (SNDS) [national health data system].
What are the Research Interests for the Secondary use of SNDS Data?
In 1999, French legislators decided to develop a système national d’information interrégimes de l’Assurance Maladie (SNIIRAM) [national health insurance information system] in order to determine and evaluate health care utilization (HCU) and expenditure. Today, these data are based on the care of almost 66 million inhabitants. A 1/97th random sample of SNIIRAM, the échantillon généraliste des bénéficiaires (EGB) (a general sample of health insurance beneficiaries), was developed in 2005 to allow a 20-year follow-up with facilitated access for medical research.
The EGB is an open cohort that includes new beneficiaries and newborn infants while keeping HCU of deceased beneficiaries. SNIIRAM has continued to grow and extend to become, in 2016, the cornerstone of the système national des données de santé (SNDS) [national health data system].Additionally, SNIIRAM has gradually integrated new information to its system, including causes of death, social and medical data, and complementary health insurance.
The SNDS’ main interest comprises of health monitoring (including disease epidemiology), analyzing (para-)medical practices, monitoring policy implementation, health economics, and pharmacoepidemiology. One limitation is the lack of detailed clinical data (no diagnosis, except for hospital admission and long-term disease status), and of results for paraclinical examinations.
Only events accompanying the diagnosis or complicating the disease are encoded (ICD-10 codes) in the hospital database. Efforts are nonetheless underway to develop algorithms for the identification of prevalent or incident patients that combine diagnostic codes, medical procedures and drugs (ATC codes) used as proxies (such as antidiabetic or hypoglycaemic drugs to identify diabetic patients). Lastly demographic data are limited.
The modalities of data access and protection systems have changed. Now that the new GDPR has taken effect, the 2016 health law has set up the Institut national des données de santé (INDS) [national health data institute], consisting of:
- Representatives of the State
- Patient and health system user associations
- Health data producers
- Public and private health data users (including Clinical Research Organizations [CROs])
The French personal data protection law has created, in the framework of the INDS, a comité d’expertise pour les recherches, les études et les évaluations dans le domaine de la santé (CEREES) [an evaluation committee for research, studies and evaluations in the field of health].
When a research project falls outside the remit of the French Reference Method, this committee is solicited to examine, within a deadline of one month, and prior to submission to the CNIL (French data protection authority), the personal data processing scheme. After CNIL approval (within maximum four months), data are made available to the research team in a secure space (for information, from May 2018, data has been made available four to six months after protocol submission).
The use of SNDS data is made complex by its large volume and its structure, determined by the primary roles of national health insurance. Analysis of these data requires specific analytical skills and the understanding of billing practices that determine access to given therapies.
Why Primarily Collected Clinical Data and SNDS Data can be Considered Complementary?
In a clinical study, the researcher’s objectives and budget restrictions result in limited scope of data collected. For example, a clinical study designed to assess the Overall Response Rate over a few months can be uninformative for survival, according to expected survival durations. Other outcomes can also be obtained in the SNDS beyond the clinical study, such as primary or secondary care data and procedures performed before or after the study timeframe.
Interestingly, HCU data can be used as an exhaustive and valid (no recall bias) source of explanatory factors to build prediction models. This can help identify subgroups with optimal efficacy and safety, or otherwise predict long-term outcomes and long-latency effects.
Additionally, health care utilization data, available during the study period, allows post-checks of data quality that will support the validity of the findings. SNDS data may also be considered for linkage with French patient registries to state on sample representation, which is one of the main challenges, in terms of their scientific credibility.
How can Clinical Data be linked with SNIIRAM Data?
From a technical perspective, linkage can be probabilistic, using a minimal number of common specific HCU variables for ensuring individual matching between both datasets with a low probability of error. Linkage can also be deterministic when the SNDS patient identifier is also collected in the clinical dataset.
With the probabilistic method, acceptable matching process depends on the ability to identify common variables in both datasets that are sufficiently specific. In case of secondary use of linked clinical data and SNDS data, the validity of the final dataset will depend on the availability of those common variables. In case of a clinical Primary Data Collection (PDC) enriched with SNDS data, study objectives should be detailed and study design should address this specific data collection need.
Whatever the technical linkage method, patient privacy rules should be strictly applied and must be consistent with the GDPR.