Real-world data (RWD) is transforming healthcare by the changing priorities of pharma companies and the growing influence of AI, with notable advances expected in 2026. RWD encompasses health information gathered beyond clinical trials, such as electronic health records, insurance claims, public health systems, and from wearable devices. However, rather than just collecting huge data sets, the focus has shifted towards securing more high quality, representative data, which can be used for specific research applications.
This move away from collating as much data as possible has brought about a greater demand for project-specific and fit-for-purpose data, particularly as the limitations of one-size-fits-all approaches become apparent. While claims data remains a cornerstone, there is a growing rise in interest for integrating clinical and laboratory data, tailored to distinct study objectives.
At the same time, enthusiasm for AI is being tempered by the complexity of healthcare and the need for robust, proven systems, with a rise in interest for AI models that are designed for specific use cases, within defined boundaries. This evolving landscape also highlights the importance of flexible, open platforms that allow for interoperability and analytics choice, ensuring that solutions remain adaptable to varying client needs.
High-quality, specific data ensures that findings closely reflect the realities researchers and clinicians are seeking to understand and builds greater trust in any conclusions drawn. This approach reduces the risk of misinterpretation, positioning researchers and sponsors to deliver real impact in both research and patient care.
The shift from quantity to quality
Relying solely on high volumes of data without representativeness or relevance can lead to skewed insights and poor outcomes. Guenter Sauter, PhD, is Senior Director of Technology Product Management at MarketScan, with extensive expertise in data, analytics, information integration and governance.
Sauter explains that pharmaceutical companies are coming to understand that the real value in data-driven insights lies not in collecting the largest datasets, but in ensuring their data is truly representative and of the highest quality. According to Sauter, high-quality data, particularly closed claims datasets, is desired for both its completeness and reliability, which enables researchers to make confident, well-informed decisions.
“What we have seen is that pharma companies, when they do use clinical data, it’s often very project specific. And there is no corresponding clinical data set where it’s one-size-fits- all that you can find on the claim side,” Sauter explains. “Depending on the disease, therapeutic area, or drug they are focusing on, different clinical data sets have different strengths.”
The foundation of data-driven decision-making within the pharmaceutical industry has long rested upon the quality and scope of claims data, providing researchers with a comprehensive view of patient journeys over time. Sauter explains that companies are still highly interested in this.
“They’re looking for closed claims data for a majority of their use cases, because they see the longitudinally and the completeness of the data,” he says.
For numerous applications, closed claims remain essential, delivering the reliable backbone on which further insights can be built.
“Having a closed claims data set at the center is critical, because then you have a complete core that is representative. So, even if your clinical data isn’t complete, you have a better understanding of the overall picture of a patient,” adds Sauter.
However, limitations of relying exclusively on claims data have become increasingly evident, as it often fails to provide the full clinical context – especially regarding specific outcomes or nuanced patient details. In response, the sector is seeing a shift toward “fit-for-purpose” datasets tailored for individual projects or studies.
“Clients are not planning to buy a linked dataset and saying this fits all our needs for the next five years. It’s much more project-specific – and, more importantly, study specific,” says Sauter.
Consequently, pharma companies are increasingly designing data strategies around flexibility, relevance, and precision. This all ensures that each dataset deployed is best suited to the research area at hand. There is a growing recognition of the limitations inherent in any single data source, and a corresponding rise in the integration of diverse, fit-for-purpose datasets. This progression not only supports better research outcomes but also reflects an industry-wide commitment to precision and adaptability in the face of increasingly complex healthcare questions.
A service-first approach in data engagement
At the core of this industry shift is a ”service-first” approach, where clients increasingly begin with tailored services or studies that address unique research questions, rather than committing upfront to large-scale data purchases.
“When pharma companies engage with us, it’s because the project teams have a very specific need. They not only need specific data, but also skills,” says Sauter. “Companies come to us more in a services engagement first, and then a data engagement second – versus the other way around.”
This change reflects both the increasing complexity of research questions and the desire for bespoke support, with clients seeking expertise and collaboration before determining their data requirements. The result is a model that begins with service and consultative partnership, leading the way for data purchases that are more precise, relevant, and effective.
The role of AI in healthcare data research
AI presents both significant promise and challenges in healthcare data. For AI to deliver reliable outcomes, it is crucial it is designed for specific, well-defined tasks, with verifiable results and transparency for both research organizations and regulatory bodies. Sauter notes that deployment should begin with more basic, well-understood questions before progressing to complex ones, ensuring that clients know where AI can be trusted and how results are validated.
“You can develop an AI model to test for a specific use case. The AI model answers only questions to that use case that you feel comfortable with, that you have tested, and that you want to answer,” explains Sauter.
By focusing on a single, clearly defined task, transparency is ensured in the model’s logic and confidence in its outputs, with results easily explainable to clinicians and regulators alike.
Sauter notes that building trust in AI relies on clear communication about its capabilities and limitations.
“It is important to be very transparent to a client, which questions you want to answer, and where do you have confidence? Because if you’re starting out and saying: ‘I can answer everything’. Then you don’t know, where is the confidence? Is it in simple questions? Have they trained more on simple questions or more complex questions? That’s the balance we’re trying to achieve.”
The importance of platforms for AI applications
The landscape of data provision in healthcare is rapidly shifting from standalone offerings to comprehensive, integrated platforms that foster greater openness and flexibility. In addition, platforms allow organizations to leverage their data across diverse tools and technologies, including AI.
Platforms such as Snowflake have become pivotal in this transformation by focusing on the data first and not the analytics. This is enabling clients to use analytics from any vendor or even from Snowflake itself. This open, interoperable approach ensures clients are not locked into a single platform or approach and creates the ideal environment for innovation.
“If you want to innovate on AI, you do need a platform. You can’t just do this with the data itself,” says Sauter.
How MarketScan helps navigate utilizing RWD
The pharma sector’s future is being shaped by several clear trends. As previously discussed, there is a shift from broad, undifferentiated data purchases to tailored, fit-for-purpose solutions. Alongside this, there is the growing integration of clinical and claims data, adoption of project-first service models, and the rise of open, interoperable platforms that fuel AI innovation. Success in this new era will depend on adaptability, transparency, and collaboration.
As the industry embraces these changes, MarketScan, a Truven data solution, helps to navigate this developing landscape, providing representative and high-quality datasets to ensure a strong foundation for research projects, supporting clients not just with robust data but also with flexible engagement and a trusted, transparent service.
With more than 30 years’ expertise, MarketScan offers integrated databases that link rich social determinants of health (SDoH) information to comprehensive claims data, providing researchers with broad, demographically representative socioeconomic insights for diverse studies. Through ongoing innovation and a steadfast focus on client needs, MarketScan helps ensure that companies are equipped to meet the challenges and seize the opportunities of the evolving RWD landscape.
To learn more about the solutions from MarketScan, download the document below.
