Beyond Single-Selects – Managing Long Lists in OpenClinica

Here at the VU Medical Center Amsterdam, we’re implementing OpenClinica (OC) for CTMM TRACER. TRACER aims at improving diagnosis, prognosis and therapy selection for rheumatoid arthritis. A series of eCRFs are being developed, ranging from questionnaires to joint scoring and DAS-score calculations.

One of our most recent CRFs is concerned with a patient’s medication. In this CRF one of the items of interest is the medication the patient is currently on. The code system employed for medication is the Anatomical Therapeutic Chemical (ATC) Classification System, which contains over 500 entries. Initially, we intended to create a single-select field containing the ATC codes and their description. Unfortunately, the number of characters involved is way more than the maximum of 4000 allowed by OC. The easy alternative would have been to define a free text field, but it is generally best to avoid those, as the data in these fields tends to pollute easily.

The solution we created employs JavaScript and JQuery. Without getting into the technical details (you can get them here), the approach is to create a (read-only) text field in OC and let a non-OC single-select item write data to this field. This ensures that the values written to OC are limited to those specified in the list, without having to store the complete list in OC. The list entries themselves are stored in an external XML file, which is stored on the OC server. If at some point the ATC-codes need to be updated, we can simply update the XML file and the updates will be available for the users.

Whereas our problem was concerned with a long list of ATC-codes, the solution can be applied to any list which is longer than OC’s maximum number of characters. All you have to do is create an XML file, which describes the list, make some minor changes to the example code provided on the wiki page and upload two files to your OC server.

Sander de Ridder (MSc Computer Science, MSc Bioinformatics)

The OpenClinica Platform – Developer Round Table Discussions

OpenClinica is a clinical trial software platform that aims to provide data capture, data management, and operations management functionality to support human subjects-based research. It can be used for traditional clinical trials as well as a wide variety of other types of human subjects-based research.

Our vision for the product is to provide data capture, data management, and operations management functionality out-of-the-box, in an easily configurable, usable, and highly reliable manner. The underlying platform should be interoperable, modular, extensible, and familiar – so users can solve specific problems, in a generalizable way.

This past spring, the development team here at OpenClinica, LLC held a series of round table discussions about how this vision is reflected in the product. Our goals were to learn critical standards and information models needed for our technology to truly reflect this vision, to develop a consistent, shared vocabulary for the problem domain and the OpenClinica technology, and identify the most urgent opportunities to put these lessons into practice in the product and the community. In particular, we spent a lot of the time in these discussions about how OpenClinica’s use of the CDISC Operational Data Model helps enable this vision.

The discussions were invigorating and thought-provoking. We’ve recorded them to share with the greater community of OpenClinica developers, integrators, and others who want to better understand how the technology works, the design philosophy behind it, and where we’re going in the future. The videos are embedded below.

But before getting to the videos, here’s a bit more background on how we think about OpenClinica as a product and a platform.

First, OpenClinica functionality should be ready out-of-the-box, easily configurable and highly usable. Some of the most important features include:

  • Data definition and instrument/form design with no or minimal programming
  • Sophisticated data structures such as repeating items and item groups
  • Support for a wide variety of response types and data types (single select, multiple choice, free text, image)
  • Data management and review capabilities (including discrepancy management and clinical monitoring) with flexible workflows
  • ALCOA-compliant controls and audit history over all data and metadata, including electronic signature capabilities
  • Patient visit calendar design with management of multiple patient encounters and multi-form support
  • Reporting and data extract to a wide variety of formats (tab, SPSS, CDISC ODM)
  • Ability to combine electronic patient reported outcome (ePRO) data with clinically reported data using common form definitions (write once, run anywhere)
  • Deployment via multiple media, mobile or standard web browser

Many of these things have already been implemented, and more are under development.

The core concept around which OpenClinica is organized is the electronic case report form (CRF). In OpenClinica, a CRF is a resource that is essentially a bunch of metadata modeled in CDISC ODM with OpenClinica extensions. It doesn’t (necessarily) have to correspond to a physical or virtual form; it may represent a lab data set or something similar. An OpenClinica Event CRF is that same bunch of metadata populated with actual data about a particular study participant. Thus, it combines the metadata with the corresponding item (field) data, with references to the study subject, event definition, CRF version, and event ordinal that it pertains to. In this conceptual view of the world, CRFs (as well as CRF items, studies, study events, etc.) are resources with core, intrinsic properties and then some other metadata that has to do with how they are presented in a particular representation. Built around these core resources are all the workflow, reporting, API, security, and other mechanisms that allow OpenClinica to actually save you time and increase accuracy in your research.

Second, OpenClinica should be interoperable. The ultimate measure of interoperability is having shared, machine readable study protocol definitions, and robust, real-time, ALCOA-compliant exchange of clinical data and metadata that aligns with user’s business processes. It should be easy to plug in and pull out or mix-and-match different features, such as forms, rules, study definitions, report/export formats, and modules, to transport them across OpenClinica instances or interact with other applications. Establishing well defined methods and approaches for integration into existing health data environments is a key goal of interoperability.

Third, OpenClinica should be modular and extensible. OpenClinica already provides some of the most common data capture and data management components and aims to have a very broad selection of input types, rules, reports, data extracts, and workflows. However OpenClinica developers should also have the freedom to come up with their own twist on a workflow, module, or data review workflow and have it work easily and relatively seamlessly with the rest of OpenClinica. User identification, authentication, and authorization should be highly configurable and support commonly used general purpose technologies for user credentialing and single-sign-on (such as LDAP & OAuth).

The CRF-centric model allows us a great deal of flexibility and extensibility. We can support multiple modalities, with different representation metadata for rendering the same form, or perhaps the shared representation metadata but applied in a different way (i.e. web browser vs. mobile vs. import job). We can address any part of the CRF in an atomic, computable manner. This approach has been successfully applied in the Rule Designer, which takes the ODM study metadata and allows browsing of the study CRFs and items, with the ability to drag and drop those resources to form rule expressions. Features such as rules and report/export formats are represented as XML documents. These documents define how the features behave in standardized ways so that one rule can, say, be easily replaced with another rule without having to modify all the code that makes use of the rule.

Finally, OpenClinica aims to be familiar in the sense of allowing data managers, developers, statisticians to work in a design/configuration/programming environment that they already know. Programmers don’t all have the same experience, and it would be somewhat limiting to force OpenClinica developers to all use the same language (Java) that OpenClinica was written in. We are constantly looking at ways to make it possible (not to mention reliable and easy!) for users and developers to interact with and extend OpenClinica in a programmatic way. This can mean anything from data loading to more meaningful integrations of applications common to the clinical research environment. As proponents of open, standards-based interoperability, our starting point is always to develop interfaces for these interactions based on the most successful, open, and proven methods in the history of technology – namely the protocols that power the World Wide Web (such as HTTP, SSL, XML, OAuth 2.0). They are relatively simple, extensively documented, widely understood, and well-supported out of the box in a large number of programming and IT environments. On top of this foundation, we rely heavily on the wonderful work of CDISC and the CDISC ODM to model and represent the clinical research protocol and clinical data.

Session 1:  from 30-March-2012 (start at the 5 min 20 sec mark)

Session 2:  from 06-April-2012 (start at the 1 min 25 sec mark)

Session 2a:  from 20-April-2012

Session 3:  from 27-April-2012

Session 4:  from 11-May-2012

Three Things I Learned About Export Job Performance From My Uncle Paulo

1. “Realize you can try to do it all…but you’ll do it slowly, very slowly.”

It’s understandable that there may be a desire to export every single dimension, for every CRF in your study, for all time. However, if that export takes 15 hours, then a job starting during off-hours may impact users the next business day, or worse, users in an alternate time-zone who may just be starting their work day.

It’s better to first get a sense of how long a small subset (e.g. reducing the temporal scope to a month) of data will take, manually, and take the elapsed time for the export to complete and multiply by the scope quotient (i.e. estimated total scope divided by reduced scope) to figure out a rough estimate of how long your export may take.

2. “Those temporal scope fields are your friend.”

If you don’t choose a temporal scope for your dataset, you will get data for a wider timeframe than you’re likely to be alive. This is may be fine for a dataset with a few dimensions, or if you happen to be looking for specific data you aren’t sure you may capture accurately with too narrow of a time scope.

However, if you KNOW what the scope is, you should specify this. This is more important with very large dataset. Breaking the dataset down into smaller scoped datasets will give you the flexibility of manageable chunks of data to schedule for export.

3. “Let your jobs breathe.”

You wouldn’t schedule two appointments, one after the other, if you didn’t know what time the first appointment would end, would you? The same idea applies to scheduled export jobs. While there’s nothing inherently wrong with scheduling one job after the other, you can only get away with this when you have a clear sense of when the first job may end. If you don’t have any idea when the first job may end, while the first job may complete successfully, the second will not if it is overrun by the first. When in doubt, give your jobs enough time to complete, by spacing them out.

Remember, it’s better to measure twice, and cut once. (My uncle Paulo didn’t actually say this last one.)

- Tope Oluwole

Thoughts on Code: OpenClinica and Open Standards with CDISC

One of the strengths of open source is the ability to open up the code base and learn by reading and doing, that is, the transparency of the code base allows everyone to get involved. However, the barrier to entry can be the complexity of the code itself; without a qualified guide, you can get ‘lost in the code jungle’ pretty quickly.

Welcome to our code

With that in mind, we are starting today to author blog posts about the OpenClinica code base, including topics like how the code is organized, what the code does, and so on. A lot more detail on this can be found on the OpenClinica Developer Wiki, but these posts, viewed as a whole, can be seen as a gentle introduction, before interested parties start to dive deeper.

When we began to design OpenClinica, we had very few requirements, but the desire  to create a fully-featured database for clinical data, aligned with open standards, making use of the best technology available. Call it the ‘tyranny of the blank page’, if you will. Every start-up faces it. Where do you start? What’s the plan? How do you build it, and what do you build first?

Luckily for us, we could use an open standard to base our schema, and our code, on top of; the CDISC ODM.

What’s a CDISC ODM?

The Operational Data Model, or ODM for short, is a standard published by the Clinical Data Interchange Standards Consortium (CDISC), and is “designed to facilitate the archive and interchange of the metadata and data for clinical research”, as it states in their website. This is a standard which is designed to a) hold metadata about a Study and all Events contained within a given Study, and b) hold Clinical Data which has been collected for a given Study. All of this information is held in XML, which is a very useful format for exchanging between sites, labs and institutions.

Figure 1: Study Metadata and OpenClinica

In the above image, you can see an XML file on one side using CDISC ODM and on the other side, an OpenClinica database. Inside the database are tables that map directly to different objects described in the XML. You’ll notice that the tables associated with study metadata also have a column called ‘oc_oid’, which are the Object Identifiers we use in all aspects of the OpenClinica application.

Figure 2: ODM Clinical Data and OpenClinica

In the second image, you see that the latter half of the XML file (the part  contained in the <ClinicalData> tags) also links to specific tables in the OpenClinica database. Since we link back to the Study metadata through those OIDs, we don’t use OIDs in those tables, but instead the conventional methods of primary keys and foreign keys in the database is good enough.

OK, so they map. But where’s the beef?

Of course, the ODM XML in the images is rather simple, and does not capture the full capability of the metadata that can be passed back and forth between different ODM data sources. For a longer example, you can take a look at the following XML, which defines the Rules governing a single Item:

Sample ItemDef in CDISC ODM XML

As you start to piece together the XML in the above example, you’ll see that not only can you define the Question in multiple languages, but you can specify which measurement it is using and what kinds of values you can accept.  The XML standard is extensible enough to add other pieces of information as well, including coded lists, data types, and so on.  More information can be found at XML4Pharma’s page entitled, ‘Using CDISC-ODM in EDC.’

In future posts, we hope to describe more about the code base, and show how it all comes together as a full-featured application. If there are topics that are of specific interest, we hope you’ll comment below and let us know what you’d like to see here in the coming months.

The Evolution of Electronic Data Capture

OpenClinica was recently featured in an article in Genetic Engineering and Biotechnology News titled “Commandeering Data with EDC Systems,” written by Dr. James Netterwald. The article briefly recounts the early days of clinical trial Electronic Data Capture (EDC). But how far have we come? Dr. Netterwald’s title (perhaps unintentionally) conjures up images of struggle and strife, which may be perhaps more a more apropos description of the journey of Electronic Data Capture than it may first appear.

As an industry, it’s taken us a good 20 years to get to where we are, and to be plain, it’s been a slow start. (In my own defense, I, and my company Akaza Research, have only been a party to the industry for the last 5 of those 20 years.) Climbing the evolutionary ladder from shipping laptops to sites to keying data into electronic case report forms is certainly progress by any measure. However, while the days of mailing tapes and disks are over, the days of real electronic data capture are yet to come. Today, most experts agree that somewhere between only one-half and two-thirds of all new clinical trials use EDC software, an of this only a very small fraction are “e-source,” defined as collecting data in electronic form at its source as opposed to keying it in from some other source. In some ways it is ironic that cutting-edge biopharmaceutical technologies are developed themselves with technologies that are, relatively speaking, much further down the technology food chain.

Notwithstanding, there are some enterprising few who have pushed the pace towards true EDC. Spaulding Clinical, a large phase 1 unit in West Bend, Wisconsin has developed a system that automatically captures ECG data from their facility’s patients and directly populates the clinical trial database with these data. A patient wears the ECG device and the data are transmitted wirelessly to the EDC system. However, this slick and highly productive solution was not developed by either the ECG vendor or the EDC vendor. It was developed by hand by one of Spaulding’s own software developers.

Why isn’t this type of solution more commonplace in clinical trials? What prevents the industry from making the most of today’s information technology? With the strong incentives currently in place to make research more efficient, our field could certainly benefit from some more forward thinking.

- Ben Baumann

OpenClinica 3.1 (project “Amethyst”) Preview

The next major milestone for OpenClinica (project code-named “Amethyst”) will be OpenClinica version 3.1. While this release contains roughly 85 tweaks, fixes, and enhancements, this post describes some of the more significant enhancements that will be included (check out the Amethyst project roadmap page for a full list—note: openclinica.org login required).

New features in OpenClinica 3.1 will include:

  • Dynamic Items in CRFs
  • Audit Log and Discrepancy Note Data with ODM Exports
  • Assignment of Failed Validation Check Discrepancy Notes
  • Relative Paths for Item OIDs in Rules
  • Discrepancy Note Flag Colors in the CRF
  • Modularization
  • Multiple Site Level Users Allowed Access to an Event CRF
  • Simple Conditions

Dynamic Items in CRFs

OpenClinica 3.1 will support Showing and Hiding of CRF Items using both the CRF template and a Rules file. When a user enters data and saves a CRF section, data fields may be shown or hidden based on value(s) that user provided in one or more other fields on that form. This capability is also commonly referred to as “skip patterns.” A simple example would be a CRF question asking if the patient is male or female. If the value provided is female, a pregnancy related question could then be displayed to the user entering data and all questions associated with males could be hidden from view.

Audit Log and Discrepancy Note Data with ODM Exports

Audit Log data and Discrepancy Note data for a subject will be available in the ODM data extract format with OpenClinica extensions. This will allow the user to have all of the clinical data, audit log data, and discrepancy note data in a single data file.

Assignment of Failed Validation Check Discrepancy Notes

Currently, OpenClinica supports assignment and workflows around only the Query type of Discrepancy Note. OpenClinica’s “assignment” capability will be expanded to also include the Failed Validation Check type of Discrepancy Note. For Failed Validation Checks, the first note in the Discrepancy Note thread will not be assigned, but Data Managers, Study Directors and Monitors will be allowed to assign the thread to a user for review/resolution. OpenClinica will also restrict the Clinical Research Coordinators and Investigators (both site-level users) from setting the status of Failed Validation Checks to Closed.

Relative Paths for Item OIDs in Rules

A new enhancement to Rules authoring will allow the Rule creator to write one Rule Assignment for a particular CRF Item, and have the Rule execute wherever that Item’s OID is used throughout all of the study events. This increases the “portability” of rules, allowing the user to write one Rule, and have it apply multiple times rather than having to author multiple Rules and multiple Rule Assignments.

As an example, a Rule Assignment would need to contain the following path in OpenClinica 3.0.x:

SE_OBSERVAT[ALL].F_GROU.IG_GROU_GROUP_1[ALL].I_GROU_TC_ADV_PRIMARY_05

If the CRF was used in multiple events, the creator of the Rule file would have to specify the path to the other event as well.

In 3.1, all the user has to do is write:

I_GROU_TC_ADV_PRIMARY_05

and the Rules will be executed wherever that Item shows up in the study.

Discrepancy Note Flag Colors in the CRF

OpenClinica 3.1 will include additional Discrepancy Note flag colors that correspond to the various statuses of a particular thread. Currently in OpenClinica, if a Discrepancy Note thread exists, the flag will always display in a red color regardless of the Discrepancy Note status. In 3.1, the color of the flag will be reflect the “most severe” status of any thread that is on a particular item (more than one thread may exist for any item). For example, if there is a Closed thread and an Updated thread on one item, the color of the flag will be yellow representing the Updated status. If there is just a Closed thread, the color of the flag will be black. To support people who are color blind or shade blind (like myself) there will be a roll over when you put your mouse on the flag, showing you the number of threads and each of their statuses when Viewing a CRF.

Modularization & New Web Services

Modularization is defined as a software design technique that increases the extent to which software is composed from separate parts, called modules. Conceptually, modules represent a separation of concerns, and improve maintainability by enforcing logical boundaries between components.

What this means for OpenClinica is we have started to separate the application into multiple pieces. In version 3.1, we have modularized the web application from the web services functionality. This will allow new web services to be developed on separate release timelines from the main web application, facilitating the system’s extensibility.

In addition, we are release some web services with 3.1, (these were contributed by Colton Smith of Geneuity and openXdata):

  • CRF Data Import
  • Retrieve a list of studies
  • Retrieve a list of events and their CRFs
  • Retrieve a list of subjects and their events

Multiple Site Level Users Accessing an Event CRF

OpenClinica 3.1 will allow different site level users access to an Event CRF, even if they are not the conceptual “owner” of that CRF. In prior versions of OpenClinica, once a user began data entry in a CRF, the system prevented other users from adding information or data to the CRF until it had been marked complete. The new feature will allow a second user to continue entering data before the CRF is marked complete.

This change to OpenClinica will also help facilitate the ease of recording adverse events in a separate CRF. A user will not have to mark it complete in order for another user to provide additional adverse events that have occurred for a particular subject. In addition, this new functionality will prevent users from accessing an Event CRF if another user already has the form open. In this case, the second user will receive a message saying that the form is currently being accessed by another user.

Simple Conditions

In addition to the Dynamics capabilities that will be part of 3.1, we have added a feature called Simple Conditions. This feature is similar to Dynamics in many ways, but can be implemented through the CRF Template directly rather than writing a separately Rules XML document.

With Simple Conditions, a person creating an OpenClinica CRF will have the ability to designate Items as “hidden” from the data entry person’s view until a particular option is chosen for another CRF Item. The data entry person will not have to click “save” on the form–instead, as soon as the option is selected, this hidden field will be shown in real time.An example of the type of use case this feature targets, is a CRF question with two fields, one for “race” and the other for “comments” (which is hidden). If the data entry person selects the value of “other” for the race field, the hidden comments field will be display underneath.

Akaza Research is excited about bringing OpenClinica 3.1 to the community! Your comments and feedback are appreciated. Please check back in next week or so for an update on our timelines for Alphas, Betas and a Production release.

OpenClinica CRF Library is Now Available!

The OpenClinica CRF Library is live! Login with your OpenClinica.org account to search, download, and share CRFs at library.openclinica.org.

Or, read more about the library here.

Follow

Get every new post delivered to your Inbox.

Join 2,055 other followers