Around the World in Three Data Integrations

Big data has been a recurring topic in medical research news for years now. It’s a topic that deserves our attention. Big data’s potential to revolutionize fields like genomics and to advance precision medicine generally is stunning. Today, though, a lot of the press is speculation. Robustly effective designer drugs for cancer, based on the patient’s genetic markers, remain an ideal that is likely decades away.

But if we adopt a broader conception of big data–one that includes the massive infrastructure supporting social media, the Internet of Things, and (potentially) interoperable health record platforms–real world applications are not hard to find.

March 2015: Researchers at Stanford University recruit 11,000 subjects into a cardiovascular study in 24 hours using Apple’s open source ResearchKit app.

September 2015: “By outfitting trial participants with wearables, companies are beginning to amass precise information and gather round-the-clock data in hopes of streamlining trials and better understanding whether a drug is working… So far, there are at least 299 such clinical trials using wearables, according to the National Institutes of Health’s records.” – Bloomberg News

July 2016: The National Cancer Institute introduces OpenGeneMed, “a portable, flexible and customizable informatics hub for the coordination of next-generation sequencing studies in support of precision medicine trials.”


One facet of these examples stands out. For all their diversity, projects that rely on big data rely just as much on collaboration.
Moving from genomes to biomarkers to disease risk models and personalized treatment requires more than one big dataset: it requires the integration of data from multiple systems that are secure, geographically separated, and disparately schematized.

Ten years ago, the ability to handle this task might have been seen as a leading-edge, if not commonly leveraged, feature of clinical technology. Today, software that cannot facilitate integration is doomed to obsolescence.

eCitizen of the Data World

What does this requirement mean for EDC? Simply put, those of us building data capture solutions need to look far beyond the “coordinator keying in vitals” use case. (Our solution for that use case had better already be rapid, reliable and easier to execute than ever, considering the burdens placed on trial sites in 2017.) With “insight by integration” at the forefront of research strategies, we technologists had better think of our system as a world traveler: one familiar with the laws in multiple countries, authorized to enter and leave those countries, and fully knowledgeable of their languages and customs. In the world of data management, this means the ability to pass authentication to enter a source database, map the data to a target, and leave the source while maintaining data provenance.

As a long-standing promoter of open, standards-based interoperability, OpenClinica represents this “world traveler.” The native language of OpenClinica’s EDC is the Clinical Data Interchange Standards Consortium Operational Data Model (CDISC ODM). This fact alone makes the OpenClinica data model an ideal cosmopolitan, instantly conversant with research peers around the globe. But holding fast to one standard is not sufficient. We need to be willing to learn new languages. By offering a well-documented web services API, OpenClinica makes it easy for its users to leverage RESTful web services, together with OAuth protocol version 2.0, to systematically:

  • extract data from almost any third-party source (e.g. labs and imaging centers),
  • associate each element of that data to the relevant Case Report Form (CRF) field.

APIs and authentication protocols offer the most direct route to turnkey integration. But it’s not enough to be powerful in the pursuit of data integration. A system has to be flexible, too, when tapping data sources that aren’t available to an API. For OpenClinica, this means providing a host of configurable tools to data managers and data entry personnel.

  • OpenClinica’s Jobs feature allows for custom imports from local files. A Job may be scheduled to run at any frequency, so that users responsible for data entry based on a regularly updated flat file (e.g. a CSV on their hard drive) may provide that data without keying in each element. A Job well-defined and set up just once improves accuracy and saves hours of research time.
  • An Import Data feature makes ad hoc batch uploads easy, as well. Users simply generate a XML file based on OpenClinica-supplied Object Identifiers (OIDs) to map data from the import file to the EDC.
  • OpenClinica supports a variety of Single Sign On (SSO) protocols, reducing repetitive authorization while maintaining security. OpenClinica is also an early and already experienced adopter of SMART on FHIR, a set of open specifications to integrate its core EDC with Electronic Medical Records (ER) and other health IT systems.

A Look at Our Passport

So far, I’ve outlined a set of capabilities required of any EDC in 2017, and claimed that OpenClinica meets them all. But where’s the evidence? In the second half of this post, I’m going put three of our partners in the spotlight. For each, OpenClinica was able to play a pivotal role in bringing together multiple data sources.

The Dutch Translational Research IT (TraIT) project, an initiative from the Center for Translational Molecular Medicine (CTMM),  “enables integration and querying of information across the four major domains of translational research: clinical, imaging, biobanking and experimental (any-omics).” While multiple systems power that integration, OpenClinica is the central hub. TraIT continues to host and support https://www.openclinica.nl, having joined together 10 trials on the platform in October of 2011. By March of 2015, adoption had grown to include 852 users at 157 sites conducting 136 studies, and by October of 2016, that usage had grown to more than 2,800 researchers and 250 research projects.

Among the selection criteria used to evaluate and ultimately select OpenClinica as a partner, TraIT specifically cited:

  • “links to other data storage and analysis tools within the TraIT platform, allowing researchers to integrate and analyse case report data, imaging data, experimental data and bio banking information,” and…
  • the “possibility to integrate with Trusted Third Party which handles proper (de-)identification of participant data within OpenClinica and other tools/services used in TraIT.”

It is worth noting that, in addition to an infrastructure that allows database integration, TraIT relies equally on OpenClinica’s open source model to build custom integrations. “The advantage of the Open Source model compared to a proprietary model, is that multiple independent contributors can review the source code, making enhancements which are then added to the version available to the entire OpenClinica community.”

Usage by the broader community helps ensure the innovation’s longevity and continued evolution. TraIT leverages these tools (such as the OC Data Importer) to help their sites import vast quantities of data in bulk fashion, eliminating transcription errors and delays.

The 100,000 Genomes Project, led and funded by Genomics England, is another example of a large-scale effort to combine clinical and genomic data. The 100,000 Genomes Project is sequencing 100,000 genomes in order to:

  • better diagnose rare disease,
  • understand its causes, and
  • set a direction for research on effective treatment

Whole genome sequencing (WGS) offers the best hope for determining which genetic mutations give rise to particular phenotypes, including disease states. WGS yields the syntactical equivalent of the three billion nucleotide base pairs that make up just one strand of one individual’s DNA, so a research program involving even one such sequencing has already entered the territory of “big data.” While highly specialized systems are responsible for sequencing itself, and yet others for the analysis of the output, an equally essential tool for this research is a system that can manage the clinical data and biospecimen tracking of subjects visiting one of several geographically dispersed clinical centers. Here, too, OpenClinica serves as the hub. Researchers at 13 NHS Genomic Medical Centers are using OpenClinica to register participants, capture clinical information, and ensure that blood samples stay matched with their de-identified contributors.

Project leaders have made public a 10-page guide to researchers on this process, one whose brevity and clarity speaks to how easy OpenClinica makes it. Due the dedication of the researchers, collaboration of participants and the fitness of the technology, the project is on track for completion in 2017.

Click image to enlarge
Click image to enlarge

PECARN, the Pediatric Emergency Care Applied Research Network, is the first federally-funded pediatric emergency medicine research network in the United States. To date, PECARN has conducted 24 studies that have already changed how clinicians are preventing and managing acute illness and injury in children.

As part of their mission to advance clinical practice, PECARN has taken a lead role in the implementation and study of clinical decision support tools. For all the potential benefit offered by these tools, questions remain about their adoption and effectiveness. Do physicians and nurses generally follow evidence-based recommendations for treatment or diagnostic procedures? When they do, are outcomes improved?

To help answer these questions, PECARN study leaders conducted a nonrandomized trial with concurrent controls at thirteen emergency departments between November 2011 and June 2014. These thirteen departments were consolidated into ten research sites. At eight of these sites, clinicians creating an EHR record for any patient <18 years old with minor blunt head trauma were automatically presented with a custom template. This template solicited additional data about the injury before providing recommendations on CT use and risk estimates of clinically important traumatic brain injuries. (CT imaging of the brain is associated with a non-negligible risk of tumor formation in those who undergo the procedure, especially children. At the same time, early detection of ciTBI–i.e. injuries leading to death or requiring neurosurgery, intubation for more than 24 hours, or hospital admission for two or more nights–is critical for effective intervention. The recommendations provided by the EHR template were intended to limit CT use to those patients who met established predictive criteria for significant ciTBI risk.)

The clinicians work in their EHR, together with subsequent cranial imaging and TBI-related outcomes, all generated data that would require aggregation to determine (1) how frequently care providers heeded recommendations surrounding CT use, and (2) whether the predictive rules for ciTBI risk were valid. That aggregation fell to OpenClinica. By accepting reports generated by each site’s EHR to automatically create study subjects, and by integrating with the source of imaging data at each site, OpenClinica enabled a true e-source study that left clinical workflows unaffected. Not one of the 28,669 subjects created in the study database required manual data entry.

Click image to enlarge
Click image to enlarge

 

 

 

 

 

 

Images courtesy of Jeff Yearley, BA, Manager of Clinical Data Management, Data Coordinating Center, University of Utah. Click here to download the slides containing the images above.

The moral? Big data isn’t just found: it’s made, through the coordinated efforts of both people and systems that travel light and fast. You’re contributing to big data during more and more of your waking hours these days. If you want to help shape it through technology, get ready to cooperate… and pack your digital bags.

Leave a Reply