eClinical Integration

Increasingly I am seeing real momentum for reducing the costs and barriers to integration of eclinical applications and data in a way that benefits users.

A great example is a recent LinkedIn discussion (you may need to join the group to read it).  Several software vendors and industry experts engaged in a dialogue about the pros and cons of different integration approaches. There is an emerging consensus that integration approaches should adopt open, web standards and harnesses the elegance and flexibility of the CDISC Operational Data Model. This consensus may signal a sea change in attitudes to standards-based integration that makes it the norm rather than the exception.

This is not new to members of the OpenClinica community. Over the years we’ve had many examples of such integration efforts described on this blog and at OpenClinica conferences. To make such efforts more powerful, reusable, and robust, the OpenClinica team has invested a great deal over the past year to create a meaningful, CDISC ODM-based model for interacting with OpenClinica. We have incorporated open web standards (RESTful APIs for transport and OAuth for security) to make the interfaces easily accessible with commonly used software tools.  This is part of a newly published resource for OpenClinica development and integration, the OpenClinica 3.1 Technical Documentation Guide. The first version of the specification can be viewed at https://docs.openclinica.com/3.1/technical-documents/rest-api-specifications. I’ve reproduced the introduction here:

Overview

We are constantly looking at ways to make it possible (not to mention reliable and easy!) for users and developers to interact with and extend OpenClinica in a programmatic way. This can mean anything from data loading to more meaningful integrations of applications common to the clinical research environment.

As proponents of open, standards-based interoperability here at OpenClinica, our starting point is always to develop interfaces for these interactions based on the most successful, open, and proven methods in the history of technology – namely the protocols that power the World Wide Web (such as HTTP, SSL, XML, OAuth 2.0). They are relatively simple, extensively documented, widely understood, and well-supported out of the box in a large number of programming and IT environments. On top of this foundation, we rely heavily on the wonderful work of CDISC and the CDISC ODM to model and represent the clinical research protocol and clinical data.

This chapter describes a CDISC ODM-based way to interact with OpenClinica using RESTful APIs and OAuth. The REST web services API relies on HTTP, SSL, XML, OAuth 2.0. This architecture makes the ODM study protocol representation for an OpenClinica study available and supports other interactions for study design.

Why REST?

The OpenClinica RESTful architecture was developed to (initially) support one particular use case, but with the intention of becoming more broadly applicable over time. This use case is based on a frequent request of end users: for OpenClinica to support a visual method for designing, editing, and testing “rules” which define edit checks, email notifications, skip pattern definitions, and the like to be used in OpenClinica CRFs. Users have had to learn how to write rules in XML, which can be confusing and have a big learning curve for non-technical individuals. The OpenClinica Rule Designer is an application that allows end users to build cross field edit checks and dynamics within a GUI based application. It is centrally hosted Software as a Service (SaaS) based application available for OpenClinica Enterprise customers at https://designer.openclinica.com.

To support interaction of the centrally hosted rule designer with any instance of OpenClinica Enterprise installed anywhere in the world, we needed to implement a secure protocol and set of API methods to allow exchange of study information between the two systems, and do so in a way where the user experience was as integrated as if these applications were part of the same integrated code base. In doing so, and by adopting the aforementioned web and clinical standards to achieve this, we have built an architecture that can be extended and adapted for a much more diverse set of uses.

This chapter specifies how 3rd party applications can interact with an OpenClinica instance via the REST API and OAuth security, and details the currently supported REST API methods. The currently supported API methods are not comprehensive, and you may get better coverage from our SOAP API. However the OpenClinica team is continuing to expand this API and since it is open source anyone may extend it to add new methods to meet their own purposes. If you do use the API in a meaningful way or if you extend the API with new methods, please let others know on the OpenClinica developers list (developers@openclinica.org), and submit your contributions for inclusion back into the codebase – you’ll get better support, increased QA, and compatibility with future OpenClinica releases.

RESTful Representation, based on ODM

“REST”, an acronym for REpresentational State Transfer, describes an architectural style that allows definition and addressing of resources in a stateless manner, primarily through the use of Uniform Resource Identifiers (URIs) and HTTP.

From Wikipedia: A RESTful web service (also called a RESTful web API) is a simple web service implemented using HTTP and the principles of REST. It is a collection of resources, with three defined aspects:

  • the base URI for the web service, such as http://example.com/resources/
  • the Internet media type of the data supported by the web service. This is often JSON, XML or YAML but can be any other valid Internet media type.
  • the set of operations supported by the web service using HTTP methods (e.g., POST, GET, PUT or DELETE).

REST is also a way of looking at the world, as eloquently articulated by Ryan Tomayko.

In the context of REST for clinical research using OpenClinica, we can conceptually think of an electronic case report form (CRF) as a resource that is essentially a bunch of metadata modeled in CDISC ODM with OpenClinica extensions:

  • Some of this metadata (data type, item name, response set, etc) is intrinsic metadata – i.e. tied to the definition of the CRF and its items and mostly unchangeable after it is initially defined.
  • Some of this metadata is representation metadata and used only when the CRF is represented as a web-based HTML form (in the OpenClinica database schema we call this form_metadata, but it also can include other things like CRF version information and rules).

An OpenClinica Event CRF is that same bunch of metadata with the corresponding item data, plus references to the study subject, event definition, CRF version, event ordinal, etc that it pertains to.

  • The notion of a CRF version pertains to the representation of the CRF. It is not intrinsic to the event CRF (this is debatable but it is how OpenClinica models CRFs). Theoretically you should be able to address and view any Event CRF in any available version of the CRF (i.e. http://oc/RESTpath/StudyA/Subj1234/VisitA/FormB/v1/edit and http://oc/RESTpath/StudyA/Subj1234/VisitA/FormB/v2/edit both show you the same data represented in different versions of the CRF). Of course the audit history needs to clearly show which version/representation of the CRF was used for key events such as data capture, signature, etc.
  • Rules are also part of the representation metadata as opposed to intrinsic metadata, even though you don’t need to specify them on a version-by-version basis.
  • Anything attached to the actual event CRF object or its item data – discrepancy notes, audit trails, signatures, SDV performance, etc is part of that event data and should be addressable in the same manner (e.g. http://oc/RESTpath/StudyA/Subj1234/VisitA/FormB/v1/GROUPOID/ORDINAL/ITEM…)

In this conceptual view of the world, CRFs (as well as CRF items, studies, study events, etc.) are RESTful resources with core, intrinsic properties and then some other metadata that has to do with how they are presented in a particular representation. We now have a model that allows us a great deal of flexibility and adaptability. We can support multiple modalities, with different representation metadata for rendering the same form, or perhaps the shared representation metadata but applied in a different way. We can address any part of the CRF in an atomic manner. This approach has been successfully applied in the Rule Designer, which takes the ODM study metadata and allows browse of the study CRFs and items, with the ability to drag and drop those resources into rule expressions. Here are some examples of additional future capabilities that could be easily realized on top of this architecture:

  • Multiple data entry modalities – a user may need to deploy patient based data entry via web, a tablet, a thick client, or even paper/OCR, each with a very different presentation. Each of these may be part of OpenClinica-web or a separate application altogether, but all will rely on the same resource metadata to represent the CRF (according to the UI + logic appropriate for that modality), and use the same REST-based URL and method for submitting/validating the data.
  • Apply a custom view (an XSL or HTML/CSS) to a patient event CRF or full casebook – some uses of this could be to represent as a PDF casebook, show with all audit trails/DNs embedded in line with the CRF data, show a listing of data for that subject, provide (via an XSL mapping) as an XForm or HL7 CCD document for use by another application) – http://oc/RESTpath/StudyA/Subj1234/VisitA/FormB/v1/view?renderer=somemap…
  • The same path used in the URLs, eg http://oc/RESTpath/StudyA/Subj1234/VisitA/FormB/v1/GROUPOID/ORDINAL/ITEMOID could be used as the basis for XPath expressions operating on ODM XML representations of CRFs and of event CRF data
  • Internationalization – OpenClinica ought to allow our CRF representation metadata to have an additional sub-layer to render the form in different languages, and then automatically show the appropriate language based on client/server HTTP negotiation (like we do with the rest of the app). Currently internationalization of CRFs requires versioning the CRF.
  • View CRF & Print CRF – use the same representation metadata (form metadata) but apply slightly different rules on how the presentation works (text values instead of form fields, no buttons, turn drop down lists into text values)
  • Discrepancy manager popup – one requested use case would allow a user to update a single event CRF item data value directly from the discrepancy note UI point of view. In this case you could think of just updating that one item as addressing the resource http://oc/RESTpath/StudyA/Subj1234/VisitA/FormB/v1/GROUPOID/ORDINAL/ITEM…. In this model, whatever rules and presentation metadata need to get applied at presentation and save time happen automatically.
  • Import of CDISC ODM XML files – imported data would be processed through the same model, but only use the metadata that’s relevant to the data import modality. Same for data coming in as raw ODM XML via a REST web service. A lot of times the import only populates one part of a CRF and the other parts are expected to be finished via data entry. This model would help us manage that process better that the current implementation of ODM data import.

There are many considerations related to user roles and permissions, workflows, and event CRF/item data status attributes that need to be overlaid on top of this REST model, but the model itself is a conceptually useful way to think about clinical trials and the information represented therein. When implemented using CDISC ODM XML syntax it becomes even more powerful. As widespread support for ODM becomes the norm, the barriers to true interoperability – shared, machine readable study protocol definitions, and robust, real-time, ALCOA-compliant exchange of clinical data and metadata that aligns with user’s business processes – get eviscerated.

* This chapter frequently refers to ODM-based representations of study metadata and clinical data in OpenClinica. We strive as much as possible to implement ODM-based representations of OpenClinica metadata and data according to the generic ODM specifications (currently using ODM version 1.3). However, to ensure our representations support the full richness of information used in OpenClinica we often have to rely on ODM’s vendor extensions capability. We have not always made distinctions in this chapter as to where we are using ‘generic’ ODM versus OpenClinica extensions, but that is documented here. It is our goal as ODM matures and supports richer representations of study information to migrate our extensions back into the generic ODM formats.

** Also note the RESTful URL patterns referred to above are conceptual. Refer to the technical subchapters of this REST API specification for the actual URLs.

The spec (like much of the code that implements it) is open source. I’m looking forward to hearing comments and feedback, and sharing thoughts on how we can encourage broader adoption across different types of eclinical applications.

Plug-in Architecture for OpenClinica Data Extracts

A major part of the Akaza mission is to make OpenClinica more flexible and customizable. Having a code base that is open source is a great place to start. But not everybody wants to develop Java code to meet their own requirements. We aim wherever possible to add configuration options and easy-to-use design tools within the user interface, but not all problems are a good fit for that approach. The solution is a series of “plug-in” interfaces that allow users to add their own capabilities and configurations, or interact with other applications. Some of these interfaces, such as loading of spreadsheet-based CRF definitions, are a critical part of OpenClinica, without which the system would not be functional. Other interfaces include CDISC ODM data import, job scheduler for import and export, SOAP-based web services, and the HTML5 popup interface that allows 3rd party applications to enter CRF data. Along the way community members have improved these interfaces and taught us a lot about how to design them better.

OpenClinica 3.1 will include a completely rewritten version of Extract Data based around a plug-in architecture that increases flexibility and functionality. We’ve learned that user requirements for organizing, formatting, and presenting data are tremendously diverse (and often conflicting), depending on the user, the intended purpose, the study, and the organization. Our old Extract Data architecture made it difficult to add new output formats or tweak the ones already there. The new functionality provides a highly extensible, easily configurable means to get data formats that meet a user’s precise requirements. It does this by:

  • Using XSL stylesheet transformations to read native CDISC ODM XML and output the data in a transformed format.
  • Specifying available formats, their associated stylesheets, and associated properties (like filename, archival settings, and whether to compress the file) in a properties file (the extract.properties file)
  • Optionally, enabling postprocessing of the transformed data to output to certain non-text file formats and destinations

We started out with a desire to simplify the native output of the OpenClinica Extract Data java application, in a way that increased quality, stability, completeness, and performance. From now on, the OpenClinica core application will only produce CDISC ODM (version 1.3, with OpenClinica Extensions) as the natively supported format. With only one native format, we’re better able to test, document, and guarantee the output. All other output formats generated are transformations from this native ODM 1.3 w/extensions format. We made sure (via the OpenClinica vendor extensions) that we can export all possible data related to a study and its clinical data in this format. In 3.1, this also includes export of audit trail, discrepancy, and electronic signature information.

After we devised a way to improve the quality, stability, and performance of the data coming out of the core, we needed to provide a way to execute the data transformations, into any of a wide variety of outputs. It was important for us to adopt standard, widely used formats and open source technologies as the basis for these transformations. We selected the XSLT (Extensible Stylesheet Language Transformations) language because of its applicability to CDISC ODM XML, extensive features, and reasonably simple learning curve. The implementation of these transformations is powered by a widely used open source engine, the Saxon XSLT and XQuery processor. The behavior of Export Data is determined by the extract.properties configuration file and the XSL stylesheets. The extract.properties file specifies the available data formats available in the system, each with a corresponding XSL stylesheet. OpenClinica 3.1 by default includes a set of XML stylesheet transformations for commonly used formats, such as HTML, Tab-delimited Text, and SPSS. The OpenClinica Enterprise Edition will include additional new formats including SAS, annotated CRFs, printable PDF casebooks with integrated audit trail and discrepancy notes, and a SQL-based data marts with normalized CRF-based table structure for ad-hoc reporting.

At this point, we can now reproduce the extract functionality available in OpenClinica 3.0, at a higher level of quality and stability. The stylesheets replicate the HTML, SPSS, tab-delimited, and multiple CDISC XML formats that were available in 3.0, and the framework will make it much easier to add new formats. However all of these data output formats are some type of text or XML based file. Users have also voiced the need to do things that XSLT cannot do by itself, like produce PDF files or load the data into external relational databases for ad-hoc reporting. The solution was implementation of a postprocessor framework that allows more sophisticated functionality. With postprocessing we can do things like generate binary output formats or send data to a target destination. Two postprocessors are included in 3.1 by default: output to a database using JDBC connectivity and generating PDF files using XSL-FO. The postprocessing step is transparent to end-users; they simply get their files for download or alternatively receive a message that the data has been loaded into the database. And the framework exists to add additional postprocessors via the addition of Java classes with references to those class names in the extract.properties file.

Execution of data export occurs when a user or job initiates a request for data. The request includes the active study or site, the dataset id, and the requested format. The end user will notice only minor differences in how they use the Extract Module. The process of creating datasets has not changed. The Extract can be still initiated from the ‘Download Data’ screen or via a job by selecting the desired output format. At this point however, rather than waiting for the download page to load, the user will be told that their extract is in queue, and receive an email and on-screen notification when the extract is complete. Execution follows a four step process:

Step 1.   Generate native CDISC ODM XML version 1.3 with OpenClinica Extensions

Step 2.   Apply XSL transformation and generate output file according to the settings in extract.properties for the specified format

Step 3.   Optionally, if postprocessing is enabled for the requested format, run the post processing action according to the settings in extract.properties.

Step 4.   Provide user notification with success or failure message.

We’ve also improved the logging and messaging surrounding extracts, which will be crucial for anyone developing, customizing, or debugging XSL stylesheets. As always, full internationalization is supported – if you want a value to be internationalized, it should be prefaced with an & (ampersand) symbol in the extract.properties file, and the corresponding text placed in the notes.properties i18n files.

As is common with software, we didn’t get to do everything we wanted in the first release of these capabilities. Some future features include:

  • Allow extract formats to be restricted to specific users, studies/sites, and/or datasets.
  • Allow loading and validation of formats within the web UI or via web services rather than via the extract.properites config file.
  • Create an exchange for XSL formats similar to the CRF Library.

Other than that we think we’ve thought of everything :-). Have we?

– Cal Collins

Pipes, Hats … and OpenClinica: Digesting HL7 in OpenClinica

If you’re an OpenClinica administrator somewhere, the chances are good somebody has asked you: “Can OpenClinica handle HL7 messaging?”

“No, it doesn’t,” you’ve said.

You probably said that with a sigh of relief because HL7 is a byzantine data exchange standard whose complexity keeps an army of consultants employed and drives neophytes like myself to madness.  The HL7 2.X specification uses eye-fatiguing pipes (“|”) and hats (“^”) as delimiters and has been referred to by experts as the “non-standard standard” (see this). Unfortunately, it is also the lingua franca of health-care messaging currently, and will likely continue to be for a long time to come.

So, it is with a heavy sense of resignation that we here at Geneuity are taking up the challenge to make OpenClinica fluent in HL7. As a contract clinical laboratory, we are particularly interested in having OpenClinica able to digest HL7 ORU messages that convey lab results.  This article details our first pass at the problem.

Our approach is shown in Figure 1.  It makes use of Mirth and a new web service in OpenClinica developed by Geneuity called EventDataInsert.  As shown, an HL7 message containing a lab result is sent by TCP to a Mirth channel which is configured to transform it into a SOAP message palatable to EventDataInsert.  EventDataInsert reads the message and then sees if the specimen has already been accessioned into OpenClinica.  If so, it inserts the data into the underlying database and signals a successful entry.  If not, it does nothing and signals a rejection.  These signals are transmitted back to Mirth which issues a standard HL7 acknowledgment (ACK) message coded with either ‘AA’ for ‘Application Accept’ or ‘AR’ for ‘Application Reject’.  It is the responsibility of whoever (or whatever) sent the HL7 message in the first place to follow up when a lab result is rejected.

To develop this strategy, we used several tools.  To generate HL7 test messages, we utilized the HL7 generator (freely available here) made by the people responsible for the ELINCS initiative.  To send and receive HL7 messages to and from Mirth via TCP, we used Netcat, another freely available utility.

And there you have it!  Of course, the HL7 standard covers much more than the delivery of lab results, but this exercise is most relevant to our concerns and represents an important first step in making OpenClinica talk the talk when it comes to HL7.

HL7 Strategy for OpenClinica

Figure 1:  HL7 Strategy for OpenClinica

First, a HL7 message conveying lab results is sent to a Mirth channel listening for TCP requests.  Mirth parses the message and transforms it into a SOAP message which it then hands off to the EventDataInsert webservice listening within OpenClinica.  EventDataInsert looks to see if the specimen to which the lab result pertains has been accessioned into OpenClinica’s underlying database.  If so, it inserts the results and signals back to Mirth that fact.  If not, it enters nothing and signals back to Mirth that it did nothing.  Mirth digests these signals and sends back to the sender an appropriately configured ACK message via TCP.

Facilitated Data Entry of Lab Results Using OpenClinica’s New Web Services Feature

As mentioned previously, we at Geneuity Clinical Research Services are big fans of OpenClinica and are even more so now with the upcoming release of version 3.0 with its new web services capability.  This article describes how we exploit this new feature to help automate entry of lab results, a particularly important topic given that we do lots of batch testing of specimens and oftentimes test the same specimen for many different analytes.

Prior to 3.0, you had three options when it came to CRF data entry.  The first was to log into OpenClinica’s web interface and manually enter your data.  This was no problem so long as you didn’t have lots and lots of data.  But we did.

Alternatively, you could upload a flat file of your data as long as it was formatted in XML and associated with the appropriate subject id’s and visit descriptions.  Assembling this file wasn’t trivial though and manually looking up each specimen’s subject and event nearly defeated the purpose of the procedure, which was to save time and effort.

Finally, you could do what we did: write custom code to automate the job.  Lab data is amenable to this sort of approach because it is always tagged with something called an accession number that uniquely identifies it.  When designing CRF’s, we always make sure to include a field for the event’s accession number, and when a specimen first arrives through our door the first thing we do is to log into OpenClinica and enter the specimen’s accession number in the appropriate event’s CRF.  Because the number is unique to the study, this entry effectively tags the event and provides a ‘hook’ inside the database so that the event_crf_id of any data item subsequently  annotated with the accession number can be easily looked up using a database query like so: ‘SELECT event_crf_id FROM item_data WHERE value = ‘<accession_number>’.  This, in turn, gives you the requisite information to insert the lab data thusly: ‘INSERT INTO item_data VALUES (‘event_crf_id’, ‘value’ …’ provided you also know the item_id.

To implement this strategy, we wrote custom servlets that operated within the context of our OpenClinica installations.  More recently, we configured MirthConnect channels to do the same.   They worked well and data entry was greatly expedited, but the coding was complex and had to be refactored over and over again for each study and for every CRF change.  While helpful, this strategy wasn’t sustainable in the long run.

Luckily, the latest version of OpenClinica provides a way out.  It incorporates the Spring WS Framework which allows programmers to write something called a ‘web service.’  A web service digests and acts upon XML data sent to it on an on-demand basis over a network.  The source need not be a human being uploading data on a web form, but, more usefully, it can be, say, a clinical testing platform automatically spitting out HL7 messages.  This, of course, is ideal in our case.  So we wrote a web service called ‘EventDataInsert’ that parses XML containing lab data values annotated with accession numbers and item names, looks up the corresponding event_crf_id’s and item_id’s, and inserts the data into item_data accordingly.  The service is generic enough so that it doesn’t have to be refactored for each and every study, but it does make some critical assumptions.  Namely, it assumes that both accession numbers and item names are unique.  So care has to be taken to ensure both these preconditions are met.

The power of EventDataInsert doesn’t just lie in the fact that it handles inserts on an unattended basis, but also in that, like most web services, it requires only simple XML as input.  The latter makes the source of the data irrelevant as long as it can be correctly mapped and transformed into XML.  We often use MirthConnect to do this, using it’s easy-to-use graphical interface to configure channels between incoming raw data and OpenClinica’s web-service interfaces.

The figure below shows a typical deployment of OpenClinica at Geneuity.  MirthConnect is used not only to get data into OpenClinica but also to generate canned PDF reports of the results.  This scenario works for us and gets easier and easier to maintain as OpenClinica evolves new electronic data capture features and makes old ones ever more robust.

Diagram of OpenClinica at Genuity Clinical Research Services
Diagram of OpenClinica at Genuity Clinical Research Services