Validating Open Source eClinical Systems

Validating software you use in clinical research is a requirement of most regulatory authorities, such as the FDA under 21 CFR Part 11. It’s also generally considered a good practice that ensures a system adequately meets your needs. However, validating software can be a confusing topic to many. Validation WP

In her white paper, “Validating Open Source eClinical Systems” validation expert, Laura Keita, articulates the validation process and responsibilities, and simplifies the principles of validation by focusing on the core mantra: “plan it, do it, prove it.” The paper also looks at validation strategies for open source eclinical software, and how the FDA has acknowledged the impact the open source model has on improved software quality—all part of the spirit of validation.

eClinical Integration

Increasingly I am seeing real momentum for reducing the costs and barriers to integration of eclinical applications and data in a way that benefits users.

A great example is a recent LinkedIn discussion (you may need to join the group to read it).  Several software vendors and industry experts engaged in a dialogue about the pros and cons of different integration approaches. There is an emerging consensus that integration approaches should adopt open, web standards and harnesses the elegance and flexibility of the CDISC Operational Data Model. This consensus may signal a sea change in attitudes to standards-based integration that makes it the norm rather than the exception.

This is not new to members of the OpenClinica community. Over the years we’ve had many examples of such integration efforts described on this blog and at OpenClinica conferences. To make such efforts more powerful, reusable, and robust, the OpenClinica team has invested a great deal over the past year to create a meaningful, CDISC ODM-based model for interacting with OpenClinica. We have incorporated open web standards (RESTful APIs for transport and OAuth for security) to make the interfaces easily accessible with commonly used software tools.  This is part of a newly published resource for OpenClinica development and integration, the OpenClinica 3.1 Technical Documentation Guide. The first version of the specification can be viewed at https://docs.openclinica.com/3.1/technical-documents/rest-api-specifications. I’ve reproduced the introduction here:

Overview

We are constantly looking at ways to make it possible (not to mention reliable and easy!) for users and developers to interact with and extend OpenClinica in a programmatic way. This can mean anything from data loading to more meaningful integrations of applications common to the clinical research environment.

As proponents of open, standards-based interoperability here at OpenClinica, our starting point is always to develop interfaces for these interactions based on the most successful, open, and proven methods in the history of technology – namely the protocols that power the World Wide Web (such as HTTP, SSL, XML, OAuth 2.0). They are relatively simple, extensively documented, widely understood, and well-supported out of the box in a large number of programming and IT environments. On top of this foundation, we rely heavily on the wonderful work of CDISC and the CDISC ODM to model and represent the clinical research protocol and clinical data.

This chapter describes a CDISC ODM-based way to interact with OpenClinica using RESTful APIs and OAuth. The REST web services API relies on HTTP, SSL, XML, OAuth 2.0. This architecture makes the ODM study protocol representation for an OpenClinica study available and supports other interactions for study design.

Why REST?

The OpenClinica RESTful architecture was developed to (initially) support one particular use case, but with the intention of becoming more broadly applicable over time. This use case is based on a frequent request of end users: for OpenClinica to support a visual method for designing, editing, and testing “rules” which define edit checks, email notifications, skip pattern definitions, and the like to be used in OpenClinica CRFs. Users have had to learn how to write rules in XML, which can be confusing and have a big learning curve for non-technical individuals. The OpenClinica Rule Designer is an application that allows end users to build cross field edit checks and dynamics within a GUI based application. It is centrally hosted Software as a Service (SaaS) based application available for OpenClinica Enterprise customers at https://designer.openclinica.com.

To support interaction of the centrally hosted rule designer with any instance of OpenClinica Enterprise installed anywhere in the world, we needed to implement a secure protocol and set of API methods to allow exchange of study information between the two systems, and do so in a way where the user experience was as integrated as if these applications were part of the same integrated code base. In doing so, and by adopting the aforementioned web and clinical standards to achieve this, we have built an architecture that can be extended and adapted for a much more diverse set of uses.

This chapter specifies how 3rd party applications can interact with an OpenClinica instance via the REST API and OAuth security, and details the currently supported REST API methods. The currently supported API methods are not comprehensive, and you may get better coverage from our SOAP API. However the OpenClinica team is continuing to expand this API and since it is open source anyone may extend it to add new methods to meet their own purposes. If you do use the API in a meaningful way or if you extend the API with new methods, please let others know on the OpenClinica developers list (developers@openclinica.org), and submit your contributions for inclusion back into the codebase – you’ll get better support, increased QA, and compatibility with future OpenClinica releases.

RESTful Representation, based on ODM

“REST”, an acronym for REpresentational State Transfer, describes an architectural style that allows definition and addressing of resources in a stateless manner, primarily through the use of Uniform Resource Identifiers (URIs) and HTTP.

From Wikipedia: A RESTful web service (also called a RESTful web API) is a simple web service implemented using HTTP and the principles of REST. It is a collection of resources, with three defined aspects:

  • the base URI for the web service, such as http://example.com/resources/
  • the Internet media type of the data supported by the web service. This is often JSON, XML or YAML but can be any other valid Internet media type.
  • the set of operations supported by the web service using HTTP methods (e.g., POST, GET, PUT or DELETE).

REST is also a way of looking at the world, as eloquently articulated by Ryan Tomayko.

In the context of REST for clinical research using OpenClinica, we can conceptually think of an electronic case report form (CRF) as a resource that is essentially a bunch of metadata modeled in CDISC ODM with OpenClinica extensions:

  • Some of this metadata (data type, item name, response set, etc) is intrinsic metadata – i.e. tied to the definition of the CRF and its items and mostly unchangeable after it is initially defined.
  • Some of this metadata is representation metadata and used only when the CRF is represented as a web-based HTML form (in the OpenClinica database schema we call this form_metadata, but it also can include other things like CRF version information and rules).

An OpenClinica Event CRF is that same bunch of metadata with the corresponding item data, plus references to the study subject, event definition, CRF version, event ordinal, etc that it pertains to.

  • The notion of a CRF version pertains to the representation of the CRF. It is not intrinsic to the event CRF (this is debatable but it is how OpenClinica models CRFs). Theoretically you should be able to address and view any Event CRF in any available version of the CRF (i.e. http://oc/RESTpath/StudyA/Subj1234/VisitA/FormB/v1/edit and http://oc/RESTpath/StudyA/Subj1234/VisitA/FormB/v2/edit both show you the same data represented in different versions of the CRF). Of course the audit history needs to clearly show which version/representation of the CRF was used for key events such as data capture, signature, etc.
  • Rules are also part of the representation metadata as opposed to intrinsic metadata, even though you don’t need to specify them on a version-by-version basis.
  • Anything attached to the actual event CRF object or its item data – discrepancy notes, audit trails, signatures, SDV performance, etc is part of that event data and should be addressable in the same manner (e.g. http://oc/RESTpath/StudyA/Subj1234/VisitA/FormB/v1/GROUPOID/ORDINAL/ITEM…)

In this conceptual view of the world, CRFs (as well as CRF items, studies, study events, etc.) are RESTful resources with core, intrinsic properties and then some other metadata that has to do with how they are presented in a particular representation. We now have a model that allows us a great deal of flexibility and adaptability. We can support multiple modalities, with different representation metadata for rendering the same form, or perhaps the shared representation metadata but applied in a different way. We can address any part of the CRF in an atomic manner. This approach has been successfully applied in the Rule Designer, which takes the ODM study metadata and allows browse of the study CRFs and items, with the ability to drag and drop those resources into rule expressions. Here are some examples of additional future capabilities that could be easily realized on top of this architecture:

  • Multiple data entry modalities – a user may need to deploy patient based data entry via web, a tablet, a thick client, or even paper/OCR, each with a very different presentation. Each of these may be part of OpenClinica-web or a separate application altogether, but all will rely on the same resource metadata to represent the CRF (according to the UI + logic appropriate for that modality), and use the same REST-based URL and method for submitting/validating the data.
  • Apply a custom view (an XSL or HTML/CSS) to a patient event CRF or full casebook – some uses of this could be to represent as a PDF casebook, show with all audit trails/DNs embedded in line with the CRF data, show a listing of data for that subject, provide (via an XSL mapping) as an XForm or HL7 CCD document for use by another application) – http://oc/RESTpath/StudyA/Subj1234/VisitA/FormB/v1/view?renderer=somemap…
  • The same path used in the URLs, eg http://oc/RESTpath/StudyA/Subj1234/VisitA/FormB/v1/GROUPOID/ORDINAL/ITEMOID could be used as the basis for XPath expressions operating on ODM XML representations of CRFs and of event CRF data
  • Internationalization – OpenClinica ought to allow our CRF representation metadata to have an additional sub-layer to render the form in different languages, and then automatically show the appropriate language based on client/server HTTP negotiation (like we do with the rest of the app). Currently internationalization of CRFs requires versioning the CRF.
  • View CRF & Print CRF – use the same representation metadata (form metadata) but apply slightly different rules on how the presentation works (text values instead of form fields, no buttons, turn drop down lists into text values)
  • Discrepancy manager popup – one requested use case would allow a user to update a single event CRF item data value directly from the discrepancy note UI point of view. In this case you could think of just updating that one item as addressing the resource http://oc/RESTpath/StudyA/Subj1234/VisitA/FormB/v1/GROUPOID/ORDINAL/ITEM…. In this model, whatever rules and presentation metadata need to get applied at presentation and save time happen automatically.
  • Import of CDISC ODM XML files – imported data would be processed through the same model, but only use the metadata that’s relevant to the data import modality. Same for data coming in as raw ODM XML via a REST web service. A lot of times the import only populates one part of a CRF and the other parts are expected to be finished via data entry. This model would help us manage that process better that the current implementation of ODM data import.

There are many considerations related to user roles and permissions, workflows, and event CRF/item data status attributes that need to be overlaid on top of this REST model, but the model itself is a conceptually useful way to think about clinical trials and the information represented therein. When implemented using CDISC ODM XML syntax it becomes even more powerful. As widespread support for ODM becomes the norm, the barriers to true interoperability – shared, machine readable study protocol definitions, and robust, real-time, ALCOA-compliant exchange of clinical data and metadata that aligns with user’s business processes – get eviscerated.

* This chapter frequently refers to ODM-based representations of study metadata and clinical data in OpenClinica. We strive as much as possible to implement ODM-based representations of OpenClinica metadata and data according to the generic ODM specifications (currently using ODM version 1.3). However, to ensure our representations support the full richness of information used in OpenClinica we often have to rely on ODM’s vendor extensions capability. We have not always made distinctions in this chapter as to where we are using ‘generic’ ODM versus OpenClinica extensions, but that is documented here. It is our goal as ODM matures and supports richer representations of study information to migrate our extensions back into the generic ODM formats.

** Also note the RESTful URL patterns referred to above are conceptual. Refer to the technical subchapters of this REST API specification for the actual URLs.

The spec (like much of the code that implements it) is open source. I’m looking forward to hearing comments and feedback, and sharing thoughts on how we can encourage broader adoption across different types of eclinical applications.

An Opportunity for Transformational Change in Clinical Trials

Life sciences research is recognized as one of the most technologically advanced, groundbreaking endeavors of modern times. Nevertheless until very recently the preferred technology for executing the most critical, costly stage of the R&D process – clinical trials – has been paper forms. Only in 2008 did adoption of electronic alternatives to paper forms take place in more than half of new trials. This recent uptick in adoption rates is encouraging, but further transformational change in the industry is necessary to fully realize the promise of Electronic Data Capture (EDC) and associated “eClinical” technologies. Two developments that could provide the framework for such change are adoption of open data standards and the use of Open Source Software.

Data standards provide uniform ways to represent information or processes within a specific frame of reference and according to a detailed specification. A standard is “open” when it is not encumbered by patent, cost, or usage restrictions. Open Source Software (OSS) is defined loosely as software that allows programmers to openly read, redistribute, and modify the source code of that software. The combination of OSS and open standards is a proven way to deliver improved flexibility, quality, and efficiency.

A community-driven open source offering that harnesses open standards can produce robust, innovative technology solutions for use in regulated clinical trial environments. Most Open Source Software is built using a collaborative development model. The OSS development and licensing model encourages experimentation, reduces ‘reinvention of the wheel’, and allows otherwise unaffiliated parties to build on the work of others. The result is that OSS can become a key driver of increased IT efficiency and a way to wring out unnecessary costs. In many cases, users can have the best of all worlds: the ability to adopt software rapidly and at low cost, the flexibility to develop and extend their systems as they choose, and the ability to reduce risk by obtaining paid commercial-grade support.

As clinical research struggles to become more automated and efficient, we need to rely on interoperable systems to meet challenges of flexibility, quality, and speed. The OSS development model also naturally leads to the adoption of well-documented, open standards. Because OSS product designers and developers tend to reuse successful components and models where available, OSS technologies are often leading implementers of standards. For example, the National Cancer Institute’s Cancer Bioinformatics Grid (caBIG) initiative is “designed to further medicine’s potential through an open source network” based on open data standards and infrastructure that support sharing of heterogeneous data. This remarkable effort aims to connect large networks of researchers in ways that enables efficient re-use of data, eliminates duplicate systems, and enables new types of translational research.

In industry-sponsored clinical trials, standards such as the CDISC Operational Data Model (ODM), Clinical Data Acquisition Standards Harmonization (CDASH), and Study Data Tabulation Model (SDTM) have gained adoption in both proprietary and OSS software platforms. In some cases, standards are mandated for regulatory submission and reporting (SDTM, clinicaltrials.gov) and obviously must be adopted. Other cases, such as use of ODM, CDASH, and general web standards such as web services and XForms tend to be adopted to the degree they have a compelling business case.

The business case for standards centers on increasing accuracy and repeatability, enabling reuse of data, and enhancing efficiency by use of a common toolset. A well-designed standard does not inhibit flexibility, but presupposes idiosyncrasies and allows extension to support ‘corner cases’. Leading industry voices share compelling arguments how to use standards such as ODM, CDASH, XForms, and Web Services to achieve these goals. Though the details are complicated, the approach offers orchestration of disparate applications and organization of metadata across multiple systems. There is change control support and a single ‘source of truth’ for each data point or study configuration parameter, so when study designs change (as they inevitably do) or a previously committed data point is rolled back, it is automatically shared and manual updates to systems are not necessary. Because the ODM, CDASH, and SDTM are used as a common “language”, the systems know the meaning and structure of data and can process transactions accordingly. Here’s a tangible example:

Lets imagine an IVR system wanted to check with an EDC system if a subject was current in a study (current meaning not dropped out, early terminated or a screen failure).  A Web Service could be offered by the EDC system to respond with a ‘True’ or ‘False’ to a call ‘IS_SUBJECT_CURRENT’ ?  Of course hand-shaking would need to occur before it hand [sic] for security and so on, but following this, the IVR system would simply need to make the call, provide a unique Subject identifier, and the EDC system web service would respond with either ‘True’ or ‘False.  With Web Services, this can potentially occur in less than a second.

Electronic Data Capture – Technology Blog, September 28, 2008

While this integration requirement could be satisfied by development of point-to-point, proprietary interfaces, this approach is brittle, costly, and does not scale well to support a third or fourth-party system participating in the transaction. It is critical that standards be open so that parties can adopt and implement them independently, and later interface their systems together when the business case calls for it. A leading industry blogger makes the case for the openness of standards within the ODM’s ‘Vendor Extension’ architecture: ”The ODM is an open standard, the spec is available for free and anyone can implement it. This encourages innovation and lowers the barriers to entry and therefore costs. Vendor Extensions are not open, the vendor is under no obligation to share them with the market and the effect is that meta-tools and inter-operability are held back.”

Having the software that implements these standards released as open source code only strengthens its benefits. Proprietary software can implement open standards, however given the proprietary vendor’s business interest to lock-in license revenue, might the vendor be tempted into tweaking or ‘extending’ the standard in a way that is encumbered to lock users into their platform? This strategy of “embrace, extend, extinguish” was made famous in the Microsoft anti-trust case of the 1990s, where it came to light that the company attempted to apply these principles to key Internet networking protocols, HTML and web browser standards, and the Java programming language. They hoped to marginalize competing platforms that did not support their “extended” versions of the standards. Thankfully, they had limited success in this effort, and the Internet has flourished into the open, constantly innovating, non-proprietary network that we know today. The eClinical technology field is at a similar crossroads. By embracing open standards, and working concertedly to provide business value in re-usable OSS technology, we can achieve a transformation in the productivity of our clinical technology investments.

Why Open Source is Good for International Health Research (and Everyone Else)

A recent article titled, “Could an Open-Source Clinical Trial Data-Management System Be What We Have All Been Looking For?”, published in PLoS (Public Library of Science) proposes that “international health research organisations combine their efforts and spending power and assist with the development of systems that are open to all.” This is a bold statement with, in my opinion, solid rationale.
 

The authors, Greg W. Fegan and Trudie A. Lang, manage numerous clinical trials for the Kenya Medical Research Institute–Wellcome Trust Collaborative Research Programme in Kilifi, Kenya. Like many other research organizations in developing countries, their work largely focuses on finding treatments for “neglected diseases” such as malaria, hookworm, and encephalopathies. They clearly communicate the inability for proprietary eClinical software to be a widely useable solution in such settings due to costly and restrictive licensing. 
 

However, Fegan and Lang define the appeal of open source as something greater than financial savings (although this is a strong motivation). In addition to freedom from license fees, open source clinical trial software built with open components and open standards is more “modifiable and amenable for use with existing software already employed.” Perhaps the most significant point made is that open source can be a more powerful way to promulgate standards and better leverage the collective efforts of disparate research institutions.
 

Indeed, the authors also point out that the impact of a well designed and supported open source eClinical system “can be beneficial to all clinical researchers” and urge “international health research organisations to combine their efforts and spending power and assist with the development of systems that are open to all and truly fit for purpose.”
 

The paper closes with the following call to action:

“Research organisations and funders should combine efforts to produce an open-source solution for trial data management. A shared platform could then be easily established, and would bring wider benefits such as electronic submission to regulators, automated sharing of data, and contribution to important public databases such as pharmacovigilance and drug-monitoring registries.

We believe that an open-source approach to a truly designed-for-purpose data-management system for clinical trials is attractive. Such a system would save money by eliminating the reliance on the use of expensive database software systems and their administrators. This would empower and enable a wider variety of people to conduct trials, as the question of capturing, cleaning, and extracting data would not be overly daunting or expensive. This point is significant, as it may encourage more investigators in resource-poor settings to take part in high-standard research that would otherwise be out of reach and beyond their capacity. Surely this would increase the scope and variety of trials that are conducted. Our hope for this article is that it will begin a debate on this topic, and lead to a concerted effort to lobby the international research and donor community to make sure this barrier to trial conduct is understood and addressed.”

I encourage you to read the full article online at the PLoS website.

An open source Elelctronic Data Capture (EDC) system? Really?

Back in 2004, when I would tell people about our open source electronic data capture (EDC) technology and our open source business model, I got a lot of crazy looks and confused reactions.

Fortunately, these days there is a much greater understanding in our industry of what open source software is and for the significance of its ability to solve core problems that proprietary software cannot. However, we still have to try very hard to make sure our users understand what is different about open source and what they should expect. Over the course of my next few posts I’ll explore some aspects of how open source matters in clinical research informatics. Some of the ideas I’ll be exploring include (note this list may change, don’t hold me to it):

  • Validation and Compliance
  • Security
  • Support and reliability
  • Customization
  • Cost

Electronic Data Capture (EDC) Spending on the Rise

Health Industry Insights, an IDC Company, recently published a new forecast of the domestic market for electronic data capture (EDC) software. The report titled, “U.S. Electronic Data capture 2006-2011, Spending Forecast and Analysis,” predicts that spending on EDC solutions will total more than $3.1 billion by 2011. This represents an average annual market growth rate of about 15% between now and then.

It is interesting to consider some of the primary factors that are likely driving this trend:

  1. New products to fuel growth. As pharmaceutical companies’ blockbuster drugs run off-patent there is a dearth of candidates waiting to replace them. As a result, many of the large pharma companies will be forced to put their sizeable cash stores to work in the form of increased R&D expenditures and/or acquisitions of smaller firms. Either way, this funding should result in more clinical research. 
  2. EDC is finally a “proven” technology. While once fairly commonplace, stories of failed EDC implementations seem to have become increasingly rare. It seems that EDC is finally delivering upon its promise to enhance the productivity and ROI of clinical studies. With EDC no longer considered a bleeding-edge approach to clinical trials, the level of EDC adoption is starting to look more like an S-curve. According to IDC, “approximately half nearly half of all new Phase I-III studies are now initiated using EDC.”
  3. The globalization of clinical trials. As clinical trial sponsors are driven to control costs and speed discovery, they are partnering with CROs on an increasingly larger portion of their studies. As a result, the CRO industry has been exploding. As competition in the industry increases, these organizations increasingly look towards EDC systems to help make themselves more efficient and deliver higher quality data to the sponsor within a shorter timeframe.

There are invariably other factors driving the growth of EDC, such as the expansion of adaptive trials and increased maturity and expectance of independent standards such as those promulgated by CDISC (Clinical Data Interchange Data Standards Consortium). The intersection of these factors is creating an exciting time in the eclinical industry and 2008 promises to be a dynamic year for new developments and market changes.