Plug-in Architecture for OpenClinica Data Extracts

A major part of the Akaza mission is to make OpenClinica more flexible and customizable. Having a code base that is open source is a great place to start. But not everybody wants to develop Java code to meet their own requirements. We aim wherever possible to add configuration options and easy-to-use design tools within the user interface, but not all problems are a good fit for that approach. The solution is a series of “plug-in” interfaces that allow users to add their own capabilities and configurations, or interact with other applications. Some of these interfaces, such as loading of spreadsheet-based CRF definitions, are a critical part of OpenClinica, without which the system would not be functional. Other interfaces include CDISC ODM data import, job scheduler for import and export, SOAP-based web services, and the HTML5 popup interface that allows 3rd party applications to enter CRF data. Along the way community members have improved these interfaces and taught us a lot about how to design them better.

OpenClinica 3.1 will include a completely rewritten version of Extract Data based around a plug-in architecture that increases flexibility and functionality. We’ve learned that user requirements for organizing, formatting, and presenting data are tremendously diverse (and often conflicting), depending on the user, the intended purpose, the study, and the organization. Our old Extract Data architecture made it difficult to add new output formats or tweak the ones already there. The new functionality provides a highly extensible, easily configurable means to get data formats that meet a user’s precise requirements. It does this by:

  • Using XSL stylesheet transformations to read native CDISC ODM XML and output the data in a transformed format.
  • Specifying available formats, their associated stylesheets, and associated properties (like filename, archival settings, and whether to compress the file) in a properties file (the extract.properties file)
  • Optionally, enabling postprocessing of the transformed data to output to certain non-text file formats and destinations

We started out with a desire to simplify the native output of the OpenClinica Extract Data java application, in a way that increased quality, stability, completeness, and performance. From now on, the OpenClinica core application will only produce CDISC ODM (version 1.3, with OpenClinica Extensions) as the natively supported format. With only one native format, we’re better able to test, document, and guarantee the output. All other output formats generated are transformations from this native ODM 1.3 w/extensions format. We made sure (via the OpenClinica vendor extensions) that we can export all possible data related to a study and its clinical data in this format. In 3.1, this also includes export of audit trail, discrepancy, and electronic signature information.

After we devised a way to improve the quality, stability, and performance of the data coming out of the core, we needed to provide a way to execute the data transformations, into any of a wide variety of outputs. It was important for us to adopt standard, widely used formats and open source technologies as the basis for these transformations. We selected the XSLT (Extensible Stylesheet Language Transformations) language because of its applicability to CDISC ODM XML, extensive features, and reasonably simple learning curve. The implementation of these transformations is powered by a widely used open source engine, the Saxon XSLT and XQuery processor. The behavior of Export Data is determined by the extract.properties configuration file and the XSL stylesheets. The extract.properties file specifies the available data formats available in the system, each with a corresponding XSL stylesheet. OpenClinica 3.1 by default includes a set of XML stylesheet transformations for commonly used formats, such as HTML, Tab-delimited Text, and SPSS. The OpenClinica Enterprise Edition will include additional new formats including SAS, annotated CRFs, printable PDF casebooks with integrated audit trail and discrepancy notes, and a SQL-based data marts with normalized CRF-based table structure for ad-hoc reporting.

At this point, we can now reproduce the extract functionality available in OpenClinica 3.0, at a higher level of quality and stability. The stylesheets replicate the HTML, SPSS, tab-delimited, and multiple CDISC XML formats that were available in 3.0, and the framework will make it much easier to add new formats. However all of these data output formats are some type of text or XML based file. Users have also voiced the need to do things that XSLT cannot do by itself, like produce PDF files or load the data into external relational databases for ad-hoc reporting. The solution was implementation of a postprocessor framework that allows more sophisticated functionality. With postprocessing we can do things like generate binary output formats or send data to a target destination. Two postprocessors are included in 3.1 by default: output to a database using JDBC connectivity and generating PDF files using XSL-FO. The postprocessing step is transparent to end-users; they simply get their files for download or alternatively receive a message that the data has been loaded into the database. And the framework exists to add additional postprocessors via the addition of Java classes with references to those class names in the extract.properties file.

Execution of data export occurs when a user or job initiates a request for data. The request includes the active study or site, the dataset id, and the requested format. The end user will notice only minor differences in how they use the Extract Module. The process of creating datasets has not changed. The Extract can be still initiated from the ‘Download Data’ screen or via a job by selecting the desired output format. At this point however, rather than waiting for the download page to load, the user will be told that their extract is in queue, and receive an email and on-screen notification when the extract is complete. Execution follows a four step process:

Step 1.   Generate native CDISC ODM XML version 1.3 with OpenClinica Extensions

Step 2.   Apply XSL transformation and generate output file according to the settings in extract.properties for the specified format

Step 3.   Optionally, if postprocessing is enabled for the requested format, run the post processing action according to the settings in extract.properties.

Step 4.   Provide user notification with success or failure message.

We’ve also improved the logging and messaging surrounding extracts, which will be crucial for anyone developing, customizing, or debugging XSL stylesheets. As always, full internationalization is supported – if you want a value to be internationalized, it should be prefaced with an & (ampersand) symbol in the extract.properties file, and the corresponding text placed in the notes.properties i18n files.

As is common with software, we didn’t get to do everything we wanted in the first release of these capabilities. Some future features include:

  • Allow extract formats to be restricted to specific users, studies/sites, and/or datasets.
  • Allow loading and validation of formats within the web UI or via web services rather than via the extract.properites config file.
  • Create an exchange for XSL formats similar to the CRF Library.

Other than that we think we’ve thought of everything :-). Have we?

– Cal Collins

The Open Source Effect: Akaza Research Provides Insight into Rapid Growth of OpenClinica

OpenClinica has seen a surge in usage over the past year, according to recent survey conducted by Akaza Research.

“Our annual survey of the OpenClinica community showed strong expansion in all key measurements of system usage,” said Cal Collins, Chief Executive Officer at Akaza. “In the past year we have seen doubling in the number of OpenClinica users and subjects, and a nearly 10-fold increase in regulatory submissions.”

The company reports that a reported 168,989 subjects have been involved in OpenClinica-powered clinical trials, a 224 percent increase from the prior year. In tandem, the company identified a 246 percent increase in the number of OpenClinica software users. The figure measures users working at the sponsor or CRO level and does not include users at clinical trial sites.

“Since these figures are based on a voluntary survey of the OpenClinica community, they are likely underestimates,” said Collins. “While it can be difficult to precisely measure the usage of freely distributed open source software, they provide a clear indication of the growth in OpenClinica adoption around the world,” he added.

The Professional Open Source Model

OpenClinica stands in stark contrast against the landscape of other EDC products that are provided under a closed source license. Akaza Research’s “professional open source” business model makes OpenClinica available in two editions. The OpenClinica Community Edition is freely available to use and modify, and may be downloaded form www.openclinica.org. The OpenClinica Enterprise Edition is a certified build of the open source technology commercially supported by Akaza Research. In many respects, the company’s business model is similar to that of RedHat (Linux), MySQL (database software), and other open source companies.

The OpenClinica rapidly growing open source community currently comprises over 10,500 users and developers, many of whom help review and adapt the open source software. Roughly 33 percent of OpenClinica users are located in North America, 30 percent in Europe, 14 percent in Asia, 9 percent in Africa, 7 percent in South America, and 7 percent in Australia. OpenClinica community members drive much of the product’s evolution, and in recent years have helped to usher the technology into a wide variety of clinical trial settings.

Worldwide Acceptance in Regulated Trials

The composition of the OpenClinica community is changing over time, with an increasing number of OpenClinica users representing commercial clinical trials. Currently, 55 percent of the OpenClinica community members identifies themselves as working in industry, with the remainder in academic or government settings.

According to Collins, “the robust overall growth is highlighted by an increasing proportion of OpenClinica users representing pharmaceutical, biotech, device, and other companies. We saw a 975 percent increase in OpenClinica-powered trials used in regulatory submissions in the past year, and in the next 12 months OpenClinica adopters expect to increase this number by another 200 percent. This is consistent with our OpenClinica Enterprise Edition customer growth, where a majority of new customers are from industry.”

For more information about OpenClinica see the OpenClinica website.

Validation Approach for OpenClinica

Lately there has been quite a bit of discussion in the OpenClinica community about validation. The following paragraphs provide a basic overview of the key pricipals and components of a validation approach for OpenClinica.

In 21 CFR Part 11, the FDA requires validation of all systems that store or manipulate data that will be part of a regulatory submission. However, the agency provides few hard-and-fast rules on what constitutes acceptable validation. Coming up with a validation plan without outside help can be a painful and inefficient process. Akaza Research facilitates the validation of the OpenClinica electronic data capture software by providing standardized documents that make the process of validating OpenClinica more efficient for our customers. Here’s a summary of what’s involved in validation OpenClinica, or any other computer system used in clinical trials, for that matter.

Because they are unsure what the FDA requires, sponsors tend to look for validation approaches that have been used successfully by others. Common validation practices in the industry are heavily influenced by a framework called GAMP, defined by the International Society of Pharmaceutical Engineers. (GAMP originally stood for Good Automated Manufacturing Processes, but has come to be used outside the realm of manufacturing equipment.)

A typical structure for validation under GAMP is to start with User Requirements Specifications, which drive Functional Specifications, which in turn inform the Design Specifications.  The vendor’s work has to follow the specifications and the vendor’s Systems Development Lifecycle (SDLC).

Appropriate testing must follow test scripts that map back to the requirements. The individual piece of equipment (such as an OpenClinica server) is tested with Installation Qualification (IQ) by the vendor or the customer. This is essentially acceptance testing of the hardware. The next test is Operational Qualification (OQ) of the system, carried out by the vendor or customer. Finally, the customer carries out Performance Qualification (PQ). PQ has to be related to the User Requirements Specifications.

When Akaza implements its OpenClinica Enterprise solution for a customer, we carry out the IQ and OQ testing, and provide the signed test scripts together with a detailed report on the setup and configuration of the software.

It is generally not practical for the users of off-the-shelf software to produce User Requirements Specifications and the PQ scripts themselves. For this reason, it is common for customers base their specifications and PQ scripts on documents they obtain from the vendor. Akaza’s enterprise validation package includes electronic copies of these documents, as well as a traceability matrix that ties the PQ scripts back to the requirements. The customer can modify them as needed.

The FDA expects the customer to run the PQ tests. You can probably also hire a third party to do this for you. While an experienced user can run through the OpenClinica PQ scripts in two or three days, it is reasonable to expect a novice to spend a week or so on it.

The other principal element of validation is to make sure that the software development process followed a defined SDLC (Systems Development Lifecycle). Akaza’s customers do this by auditing our procedures. At a minimum, they look at our Standard Operating Procedures, and our training records. They look at records of our internal test procedures. In addition they will sometimes look at our software defect tracking systems and source code configuration management systems.

Validation of electronic record keeping systems is a labor intensive process, but is an essential element of any submission to a regulatory body. Software vendors can make the process much more efficient for their customers. At Akaza Research, that’s one of the key things we do for users of OpenClinica to ensure they are compliant with regulations such as 21 CFR Part 11.