Thoughts on Code: OpenClinica and Open Standards with CDISC

One of the strengths of open source is the ability to open up the code base and learn by reading and doing, that is, the transparency of the code base allows everyone to get involved. However, the barrier to entry can be the complexity of the code itself; without a qualified guide, you can get ‘lost in the code jungle’ pretty quickly.

Welcome to our code

With that in mind, we are starting today to author blog posts about the OpenClinica code base, including topics like how the code is organized, what the code does, and so on. A lot more detail on this can be found on the OpenClinica Developer Wiki, but these posts, viewed as a whole, can be seen as a gentle introduction, before interested parties start to dive deeper.

When we began to design OpenClinica, we had very few requirements, but the desire  to create a fully-featured database for clinical data, aligned with open standards, making use of the best technology available. Call it the ‘tyranny of the blank page’, if you will. Every start-up faces it. Where do you start? What’s the plan? How do you build it, and what do you build first?

Luckily for us, we could use an open standard to base our schema, and our code, on top of; the CDISC ODM.

What’s a CDISC ODM?

The Operational Data Model, or ODM for short, is a standard published by the Clinical Data Interchange Standards Consortium (CDISC), and is “designed to facilitate the archive and interchange of the metadata and data for clinical research”, as it states in their website. This is a standard which is designed to a) hold metadata about a Study and all Events contained within a given Study, and b) hold Clinical Data which has been collected for a given Study. All of this information is held in XML, which is a very useful format for exchanging between sites, labs and institutions.

Figure 1: Study Metadata and OpenClinica

In the above image, you can see an XML file on one side using CDISC ODM and on the other side, an OpenClinica database. Inside the database are tables that map directly to different objects described in the XML. You’ll notice that the tables associated with study metadata also have a column called ‘oc_oid’, which are the Object Identifiers we use in all aspects of the OpenClinica application.

Figure 2: ODM Clinical Data and OpenClinica

In the second image, you see that the latter half of the XML file (the part  contained in the <ClinicalData> tags) also links to specific tables in the OpenClinica database. Since we link back to the Study metadata through those OIDs, we don’t use OIDs in those tables, but instead the conventional methods of primary keys and foreign keys in the database is good enough.

OK, so they map. But where’s the beef?

Of course, the ODM XML in the images is rather simple, and does not capture the full capability of the metadata that can be passed back and forth between different ODM data sources. For a longer example, you can take a look at the following XML, which defines the Rules governing a single Item:

Sample ItemDef in CDISC ODM XML

As you start to piece together the XML in the above example, you’ll see that not only can you define the Question in multiple languages, but you can specify which measurement it is using and what kinds of values you can accept.  The XML standard is extensible enough to add other pieces of information as well, including coded lists, data types, and so on.  More information can be found at XML4Pharma’s page entitled, ‘Using CDISC-ODM in EDC.’

In future posts, we hope to describe more about the code base, and show how it all comes together as a full-featured application. If there are topics that are of specific interest, we hope you’ll comment below and let us know what you’d like to see here in the coming months.

Motivations for Contributing to Open Source

There are currently over 50 different types of open source software licenses approved by the OSI (Open Source Initiative).[1] One consistent theme these licenses share is that they encourage contributions from a community of users and developers. In numerous instances these contributions have proved significant and resulted in the establishment some of the most dominant technologies on the Web today, such as Apache, Linux, PHP, Java, MySQL, and SugarCRM. What are the factors that compel people to contribute to these projects? It seems the motivation comes from two sources: organizations and individual developers.

Open source can be strategic to organizations in several ways. For example, in the clinical research industry, contract research organizations (CROs) might incorporate an open source clinical data management system like OpenClinica into a complete clinical trial solution offered to their customers. Building OpenClinica into part a larger infrastructure may involve adding to or modifying the software in some way. Organizations doing this have a vested interest in contributing their software improvements back to the broader community in order to ensure these enhancements are supported in future distributions of the software. In this way, an organization can leverage a freely available software product for their own, customized purposes while helping to avoid “forking” the software into a unique product they might be stuck maintaining themselves.

While there may be solid business rationale for organization to use and contribute to open source, ultimately the software’s improvements come from individual developers. What are the motivations of individual developers to contribute to an open source project? Obviously, any company requiring its developers to work on an open source product for the company’s own purposes is providing one type of motivation for that developer to contribute. However, many open source projects largely comprise developers who purely volunteer their time outside of their capacity as an employee in a company. History has shown that over time these volunteers have produced some of the most paramount and sustained successes in the software world.

Take, for instance, the Apache project. Apache is the world’s most popular web server that began in 1995 at the University of Illinois, Urbana-Champaign and today powers nearly 50 percent of all websites worldwide.[2] While the software’s development is the result of an ongoing effort of volunteers, the community has evolved an organizational structure that appears to engender a motivational atmosphere among developers.[3] For example, a study at the University of California run by Il-Horn Hann and colleagues found that the salaries of Apache project contributors correlated positively with the contributor’s rank in the Apache organization and this ranking, therefore, is an indication of a developer’s productivity and market value to an employer.[4]

Many developers may of course contribute to an open source project out of intellectual curiosity or pure altruism. However, it seems the basic principals of economics can help to intensify the desire to contribute. Regardless of any one party’s motivation, it is undeniable that the meritocracy inherent in open source is an intriguing, if not highly effective paradigm for software development that is continuing to have a significant impact on modern computing.


[1] http://www.opensource.org/licenses/alphabetical

[2] http://news.netcraft.com/archives/2008/06/22/june_2008_web_server_survey.html

[3] The not-for-profit Apache Foundation helps to organization and coordinate the Apache open source community.

[4] I-H. Hann et al., “Economic Returns to Open Source Participation: A Panel Data Analysis,” unpublished working paper, Univ. of Southern California.