Register for OC17 today and secure early bird pricing!

OC17 register today

We are thrilled to announce the venue and keynote speaker for OC17, our 9th Annual Global Conference, and to open registration for the event.

OC17 Theme and Dates
“Making the Complex Simple”
December 4 – 5 | Sessions and Workshops
December 6 – 8 | Super User Training

Mövenpick Hotel Amsterdam
Piet Heinkade 11
1019 BR  Amsterdam

Keynote Speaker
Dr. Andrew Bastawrous
Co-Founder and CEO
Peek Vision

Further details, pricing, and a registration form are just a click away. Reserve your spot now to lock in early bird pricing.This year’s theme is “Making the Complex Simple,” and we are proud to offer a space and speaker exemplifying that theme. The Mövenpick is a short taxi ride from the international airport in Amsterdam. We are confident that your stay there will delight with its simple elegance. We are especially honored to welcome Dr. Bastawrous as our keynote speaker. His story on the impact that innovation can have must be seen to be believed.



We hope you can join us in December as we turn the spotlight on our incredible user community once again. Do not hesitate to email with any questions regarding OC17.

Ben Baumann, COO

14 Best Practices for ePRO Form Design

Is there any term in data collection more despairing than “abandonment rate”? That’s the percentage of respondents who start inputting data into a form but don’t complete the task. Sometimes, it’s hard not to take that kind of rejection personally.

Sadly, it’s just one problem among dozens that hinder the collection of timely, clean and complete data directly from study subjects. Data managers face challenges from the start. A participant’s compliance with study procedures (and data entry) is always only voluntary, and apart from the occasional stipend, these participants rarely receive compensation in dollars. That’s not to say that the care provided in the course of a study isn’t of value to patients. But that “quid pro quo” mindset isn’t easy to maintain outside the clinical site. Paper diaries and even electronic forms ask a participant for their time and attention, commodities that are in short supply these days. As an industry, we can’t stop looking for ways to minimize the burden on participants. Not if we want to maximize their contribution.

In previous posts, we explored BYOD as a preferred approach to collecting ePRO data. But that’s only half the story. What good are friendly, motivational messages to a participant’s mobile device and computer, if the form to which they’re directed is needlessly long or convoluted? That’s a recipe for heartbreak (form abandonment), or at least a host of trust issues (incomplete or poor quality data).

So, what are the keys to getting your participants to enter all their PRO data, accurately and on time? Not surprisingly, they’re not too different from those we rely on in any budding relationship. Below, I bundle them into four categories.

Make a good first impression

Imagine yourself as a job interviewee. The hiring manager asks you to rattle off the top three responsibilities in every role you’ve ever filled. How coherent will you answer be? A practiced interviewer, interested in getting to know you, would start differently. “Tell me, what’s your top priority in your current role?” She’ll need to gather a lot more information before the end of interview, but she’s set out a comfortable pace for getting it.

The lesson for ePRO form design is clear.

1. Start with a single (or very small number) of questions that can be answered simply. Present these one to three questions, and no more, on the first page.

This best practice is one example of a broader rule: always proceed from the simple to the complex. Don’t ask the participant to choose one of sixteen options. Rather, ask two yes or or no questions that will reduce the options first to eight and then four, and have the user pick from that list. Yes, the latter scenario involves more clicks, but it involves less cognitive labor and less scrolling.

Dress to impress

The “e” in “ePRO” isn’t usually capitalized, but maybe it ought to be. Leave the paper mindset behind. Spatial and color constraints no longer apply, and you can present information in a way that prevents the participant from looking ahead (and becoming discouraged). In short, you’re free to give respondents a form-reading and form-filling experience they actually enjoy. Here are some pointers on visual cues that work with the eyes and minds of your participants:

2. Black or dark grey text on a white background is always the safest default for instructions and form labels. For section headings and buttons, a vibrant color that contrasts sufficiently with the background can help orient the respondent.

3. If you have a choice of fonts, sans serif is preferable. Why? While studies do not point to a clear winner for readability in all contexts, evidence suggests that lengthy text passages that ask the reader to parse several ideas at once are best served by serif fonts, while short prompts and labels are best conveyed with a sans serif font. (And you already know that short and direct is best, right?) The wide variety of serifs in existence make character-recognition more difficult to non-native readers, while san serifs are more likely to remain readable when scaled down.

4. Place the field label above, and left justified with, any field requiring the respondent to type. This “stacked” format suits the portrait orientation with which most smartphone users hold their device. Placing the field label inside the field for the user to type over may save space, but it also causes your label to disappear once the user interacts with the field.

5. Avoid grids. There are contexts where a grid layout is preferable; for example, when recording multiple instances of body position, systolic pressure and diastolic pressure for a patient undergoing blood pressure tests. But these are almost always in-clinic scenarios. For the collection of ePRO data, stay within a single column layout.

6. Paginate. One screen containing six questions, or two screens containing three each? All things being equal, fewer questions on more pages is preferable. Why? Clicking next is easier than scrolling. Also, breaking up a large set of questions into two smaller ones reduces the intimidation factor.

7. Use images as selection options when text could lead to misinterpretation. Not all form engines support this feature, but the ability to present an image in place of (or with) a checkbox or radio button is more than a “nice to have.” Which of the following is more likely to your quality, computable data?





I’ve relied on a few best practices in this example, starting with some very simple questions in order to narrow down the remaining ones and present them in a manner that gives us clean, accurate, and computable data. But the images in particular are what rescue the participant from needless typing and potential confusion. By making the regions within the head illustration clickable, I can capture very discrete data without taxing the participant’s time.

Respect their time

“Press C to confirm or X to cancel.” That’s a familiar formula to anyone who’s received a text message before an appointment. These are easy to appreciate. If I feel I owe any response to the sender, I’m more likely to complete this pseudo-form than any other.

Chances are, however, you may need a little more data than this from your participants. Here’s how you can gather it while respecting your participant’s time.

8. Minimize the number of fields. This advice may seem obvious and simple, but it’s neither. So long as a participant’s date of birth has been documented once, having them input age is never necessary. And if your system is capturing metadata precisely (e.g. a date and time stamp for every change to a field’s content), then you don’t need to ask the participant to record this information. In general, before adding a field, it is helpful to ask:

  • Do I really need this information? If yes, then…
  • Can I reliably deduce or calculate it based on prior data? If no, then…
  • Do I need the participant to supply it? If (and only if) yes, then include the field.

9. Use skip logic. The phrase “if applicable” should never appear in a form, especially forms designed for participant use. If you are asking the question, it had better be applicable. You can ensure that it is by using skip logic. Has the participant eaten that day? Only inquire about nutritional content for that day’s meals if she responds with “yes”.

10. Use branching logic. Branching logic can help a participant isolate the response she wishes to provide by a process of elimination. Suppose you need a participant to input a cancer diagnosis she had received. Given the wide variation in health literacy, we can’t assume she recalls the formal diagnosis. It may be more helpful to solicit this information through a series of questions easier to answer. Did the cancer involve a solid mass? Depending on the participant’s response, the following question might pertain to an organ system (if “yes” is selected) or blood cells (if “no” is selected). Just five yes or no questions can narrow a field of 32 options down to one.

Doesn’t the use of branching logic conflict with strategy of minimizing the number of fields? These are guidelines, not rules, so trade-offs are sometimes necessary. A drop-down menu showing 32 option may represent just one field, but scrolling through that many selections (not all of which will be visible at once) places an enormous hurdle in front of the participant. The mental effort and time spent scrolling on that one field far outweighs any time savings that might be secure by eliminating three fields. Meanwhile, you’ll have frustrated the participant.

11. Use autocomplete. There’s another way of solving the history of cancer problem above. As long as participant can recall any portion of the diagnosis when spelled out, autocomplete can help them retrieve the full term. The best instances of autocomplete reveal all matches as the participant types, so that upon typing “Lym” a choice is immediate presented among:

Acute Lymphoblastic Leukemia (ALL)
Central Nervous System Lymphoma
Hodgkin Lymphoma

The ubiquity of predictive search among search engines like Google makes autocomplete a familiar experience for your participants.

In an era where summoning a taxi or making a credit card payment is a 20-second tasks, participants will not (and should not) tolerate inefficiency on the web. You are competing with their other online experiences, not just traditional, paper forms. The good news is that you can delight your participants by showing them that even contributing to medical research can be as easy as their navigating their favorite web pages.

Show appreciation

You’ve read 85% of this post! Knowing that doesn’t guarantee you’ll read to the end, but it does make it more likely. Regular, responsive feedback is a powerful spur to action. Here are three ways to interact positively with your participant throughout (and even after) the form-filling process.

12. Convey their progress as the participant completes the form. Reflecting back to the participant the portion of the form they have completed and the portion that they have remaining serves two functions. The first is informative. You’ve anticipated the participant’s question (“how much longer?”) and answered it proactively. The second is motivational. Completing even simple tasks triggers the release of dopamine in our brains. We get a neurochemical rush every time we drop a letter in the mailbox or hit send on an email. 

You can reward your participant throughout the form-filling process by incorporating a dynamic progress bar into your ePRO form. Every page advance is an opportunity to “dose” your participant, visually, with a bit of happiness.

13. Autosave. Batteries die. Smartphones drop to the floor. Thumbs twitch and close screens. None of these scenarios is justification for losing a participant’s work. Your system should capture input on a field-by-field basis; that is, a back-end process should save the participant’s input into a field the moment he or she leaves that field for another. If a participant abandons a form and then returns to it, he or she should be able to resume where they left off. If you can signal back to the participant that their input has been saved with each field transition, all the better, as this leverages the same psychological power as the progress bar.  

14. Show gratitude. Imagine a campaign staffer asking you a series of questions about your views over the phone. You answer the last question and he or she hangs up without so much as a goodbye. Chances are, they’ve lost your vote on account of rudeness alone.

Don’t let this happen to your participants. When they submit a completed form online, they should immediately receive a “thank you” message that is specific to the task they have just completed.

Ensuring the optimal experience for participants supplying ePRO data is more than courtesy: it’s a critical measure for maximizing data quality, quantity and timeliness. So commit to dazzling the people who matter most in your research. Because, as in all relationships, you get back what you give. Click here to learn more about participant friendly ePRO from OpenClinica.



Save the Date! OC17, December 4th and 5th, in Amsterdam

OC17, OpenClinica’s 9th Annual Global Conference, will take place in Amsterdam, on December 4th and 5th this year. This year’s theme? “Making the Complex Simple”

We will offer in-person Super User training from December 6th through the 8th.

Exact venue, times, pricing, and official call for presentations all coming soon. In the meantime, please let us know if you’re interested in attending!

Automate Your Collection of Lab Reference Ranges

Data managers invest a lot of time and attention documenting lab processes, and for good reasons. Regulatory compliance demands it. Also, ensuring the validity and clinical significance of lab results is critical to assessing safety and efficacy. But while necessary, this process is often inefficient and error-prone.

In an ideal clinical study, every lab sample would, within minutes of collection, find its way to a central lab whose equipment was forever up-to-date, whose validations were always fresh, and whose inner workings were transparent to the data manager. But clinical trials aren’t conducted in an ideal world. More often than not, data managers and local lab managers share an ongoing responsibility to document equipment features and report on results collected on a variety of instruments, all calibrated differently. The challenges associated with this process are familiar. Equipment changes. Validations expire. And one lab’s “normal” may be another lab’s “low.”

The task of keeping labs up to date for many data managers is akin to keeping dozens of centrifuges spinning at the same rate, all at the same time. Collecting lab reference ranges from one lab for one analyte may be straightforward, but when the process is multiplied across dozens of analytes and sometimes hundreds of sites, your study can be exposed to significant time delays and human error. Success in this task, like most, hinges on clear expectations and guidance. Here is where good data managers shine. By providing sites with explicit instructions, a deadline, and tools to boost completeness and accuracy, data managers can make the collection of reference ranges a lot less painful and time-consuming.

Anatomy of a Lab Reference Range

Ranges are always defined by either:

  • a standard applied to all labs contributing data to a study (“textbook ranges”), or
  • the individual lab

Often, the difference between the two is minimal, so adopting the textbook range can save time and administrative burden. For measures that are critical to analysis, though, using a textbook range may not be suitable. In that case, each local lab manager (or the site coordinator representing that lab) must communicate to the study’s data manager their “in house” range for all analytes measured in the study. In both cases, a range is not complete unless it specifies

  • the name of analyte
  • the unit of measure
  • the lower limit of normal, by gender and age
  • the upper limit of normal, by gender and age

Even for one analyte, the normal range for a 25-year-old female may differ from that of a 50-year-old female, or a 25-year-old male. Consequently, specifying a range for an analyte often means specifying a number of “sub-ranges” that, taken together, associate a normal range for every possible patient. For example:

In the course of providing comprehensive ranges for dozens of analytes, it’s easy for a lab representative to overlook (or duplicate) an age or gender inadvertently. A well designed, dynamic form for capturing these requirements can help ensure exactly one range is provided for any given individual.

Anatomy of a Lab Reference Range Collection Form

Just as a value without a unit of measure is meaningless, so too is a local lab range that is not tied to a particular lab. Along with their ranges for each study analyte, labs should also provide a set of identifying information. The data manager, as part of her request to provide the ranges and lab information, should also specify the study for which the ranges are being collected. A complete lab reference range collection form includes all of these components.

Specified by the data manager

  • the name of the sponsor and study (avoid abbreviations or paraphrases)
  • which analytes are included in the study, and therefore require ranges from the lab
  • where the lab representative must send the completed file
  • a deadline for completing the file

Entered by the lab representative

  • the name, address, and applicable ID numbers (e.g. CLF, or core laboratory facility, number) of their lab
  • the name of the Principal Investigator for the site and study it serves
  • the effective date of the ranges to be provided
  • the LLN and ULN for each analyte, by gender and age

Tools You Can Use

For users of OpenClinica, we’ve designed a form template that can be used as a reference range collection form, which includes the components listed above. Try it here! Would you like to use a customized version of this form in your study? Contact your client support lead. For those not using OpenClinica, we’ve built an Excel workbook. Download it for free here.

Click either image above to test this form.

Click either image above to download the Excel version.

Staying Current

Regardless of how labs communicate their reference ranges, it’s essential that the communication is ongoing. Changes in equipment or clinical guidelines often occasion changes to upper and lower limits of normal. That’s why an effective date must be documented for all ranges. Good data managers encourage sites to communicate any such changes promptly. Great data managers give them the reminders, and tools, to do so.

We welcome your input on the workbook above, just as we do on our data management metrics calculator. Please let us know what you find most valuable.

What DIA Stands For, To OpenClinica

What is it about the annual Drug Information Association meeting that energizes those of us working to improve the eclinical experience? Sure, it’s a terrific opportunity to showcase our products and services to research teams that could benefit from them (read: business development). But it’s more than that. “Just make the sale” is no credo for this industry. We serve those who serve patients, tirelessly working to enhance their lives. It’s impossible not to feel privileged by that responsibility, and the DIA conference is a chance for OpenClinica to demonstrate once again our resolve in meeting it. Every summer, we’re reminded to step back and prove to peers that our business aligns with the all-important goal of making trials as effective as they can be, so that safe and effective medicines get to the right patient at the right time. That means distilling the complex processes behind data capture into a story from which every DIA attendee, from data manager to CRA, can draw inspiration.

Back in February, I suggested an outline for that story: “making the complex easy”. I’m pleased to report that that narrative is gathering momentum. Our upcoming release combines power and ease-of-use in a manner that we believe is unprecedented. It will enable data managers, researchers and study participants to do more in less time, while rediscovering that sense of joy a well-designed web experience offers. It’s our way of keeping trials from turning into ordeals.

So yes, as a business, we want to grow by meeting more research teams and sharing the OpenClinica story with them. The annual DIA conference helps us achieve that. But by bringing together the most accomplished teams in drug development, the conference does more. It’s a place to improve our understanding of the challenges research teams face, and stay accountable to the ideals that led to our founding and all of our growth since then. If you’re attending DIA in Chicago, I hope you’ll find time to visit us at booth #1748, so we can show you just how energized those ideals are keeping us.

A Preview of our DIA Plans

Talks from drug development luminaries. Exhibits that combine “luxury apartment” with “miniature theme park.” And a city that offers some of the world’s best modern architecture.

Those descriptions don’t do justice to what DIA 2017 has in store, but they do fit the experience. As any veteran attendee can attest, there’s an outsized splendor to the conference. But it isn’t splendor for splendor’s sake. Half of it is celebration for the advances the industry has made in bringing life-changing therapies to market. The other half is a rallying cry to bring even more.


OpenClinica will be there, to join the celebration and the rallying cry. Attendees can find us at booth 1748, near the central break lounge. And we plan on using our patch of exhibit space to the fullest.

Our goal is simple: we want to thrill everyone there. We’ll do that by giving visitors to our booth an up-close look at the new OpenClinica, featuring a collaborative study designer, forms with beauty and brains, rich, visual reporting, and more. We’ve come to call this “re-engineering the e-clinical experience,” because it’s our hope that software in this industry can shed its reputation as a “utility” and gain one as the way researchers want to work (and the way participants want to contribute).

We think this new experience is as thrilling as, say, journeying through the human brain, or conducting zoological research in an alternate universe. Almost as thrilling, anyway. So we’re bringing along an Oculus Rift to help make our point. Visitors to our booth will get a chance to wear the goggles, grab the controllers, and immerse themselves in some fantastic worlds. Best of all, we’ll raffle the system off to one lucky winner.

But the most valuable offering comes from the attendees. We get to hear directly from them on what’s working, what’s not, and what needs changing in the world of eclinical. We’re confident we’ve addressed many of those needs with our upcoming release, but as company devoted to continuous innovation, we’re never finished learning and iterating on our successes.

Will we see you there? If so, be sure to schedule a visit to our booth, and brace yourself for the e-clinical experience you’ve always imagined.


A Prescription for Data Management Health

“Three days to enter data, five days to answer queries.”

The rule couldn’t be any clearer. You’ve told your sites at the IM and reminded them in each newsletter. You know you won’t get 100% compliance, and that’s fine. You’re reasonable.

But this is getting out of control.

As a data manager, you’ll always live with missing forms, blank fields and open queries. It’s a chronic condition that gives rise to acute episodes around interim and final locks. You’ve learned to manage it, even thrive with it, but you know there’s got to be a more effective treatment regimen.

Good news. While there’s no panacea, I’d like to offer a tool you can begin using today, regardless of your systems or processes, to spur your sites onto improved data entry, query resolution, and even enrollment. But as with any treatment, we need to consider directions, precautions and potential side effects.

First, though, some background. If you use EDC and IxRS to facilitate data collection and enrollment, you’ve probably made it a habit to pull their stable of available reports at some regular interval. (If not — if you’re relying solely on the summary statistics and visualizations available on these systems’ dashboards — consider getting acquainted with the detailed exports. This post will explain why.) These reports are almost always available in some Excel-readable format. Chances are you’ve become practiced at applying some formulas to the data inside. (If not, here’s a tutorial on getting started.) The calculations you make are vital in assessing which sites are leading the pack in subject recruitment and data management tasks, including the timely entry of data or resolution of queries. You and your fellow study leaders depend on this information to refine projections, meet lock deadlines, and offer assistance to those sites behind the curve on key operational metrics. But do you share this information with sites?

Yes! As interim locks approach, I always email out the number of total open queries and missing forms, along with encouragement to tidy these issues up. If that’s your response, you’ve already adopted a best practice. But there’s more you can do.

Provided you do so with the right context and tone, you can and in many cases should communicate to each site exactly how they compare to their peers on several key metrics, from average open query age to subjects screened per month. When you supply this information, you recognize the site’s invaluable contributions, feed their natural and justified curiosity, and tap their desire to maximize their performance.

This practice involves three major challenges. The first challenge is calculating useful, “apples-to-apples,” site metrics from the raw data found in your EDC and IxRS reports. The second is distributing this information to each site in a systematic way.  The third is couching this information in a message that conveys gratitude and support. But each can be met.

Making the calculations

Here, I can offer some great news for users of OpenClinica, and a valuable tool for everyone. OpenClinica now supports a suite of configurable reporting dashboards, providing data managers and those they authorize (including sites) with clear, real-time visualizations of their study data. If you’re currently using OpenClinica, contact us and we’ll gladly share more details.

To help you get started now, regardless of your EDC or IxRS, we’ve created a workbook that performs dozens of calculations for each of your sites based on reports common to nearly every system. It’s free, and guides you step-by-step through converting raw exports into powerful analytics.

Distributing the information

Once you’ve created a table of performance metrics by site, you have the beginnings of a “mail merge.” You simply need to add a column specifying the email address of the individual responsible for data entry for each site.

The steps for executing a mail merge differ from email client to email client. However, some starter documentation is available here:

Setting the context

So far, we’ve touched on the technology of quantitative performance reporting. But what about the art? It’s crucial that sites understand that your intent isn’t to chastise, but to inform and encourage. The metrics you calculate are just one piece of a broader discussion, which would include particularities that simply aren’t reflected in a spreadsheet, such as patient availability and staff experience. A site whose “screened per month” measure ranks in the bottom quartile may have had to overcome incredible hurdles to enroll their six or seven subjects. Meanwhile, they may be adding valuable thought leadership.

To establish the right tone, you might consider adopting a message template like this one:

Hello Site <<site_id>>,

The Data Safety Monitoring Committee will meet two weeks from today, so it’s important we enter all data for visits that occurred on or before March 31st by this Friday, and close all queries by next Wednesday. We can’t thank you enough for your diligence in screening qualified patients and entering data. As you well know, your efforts here support not just our study, but the patients themselves.

It’s been an incredibly busy month, and we recognize it’s not always possible to enter data within five days of events. We realize some queries take weeks to close. And we know your first priority remains and should remain your patients, whether they’re participating in this study or not. Your accomplishments are all the more impressive in light of these facts.

We believe you deserve insight into the contributions you’re making to our study. That’s why we’re initiating a weekly, custom report to share your site’s progress with you. We understand you may be curious about how your “numbers” stack up against those of those of other sites, so we’ve included some comparative measures in this report. Also, to help you navigate data management, we’ve listed out your missing forms and open queries as of the report date shown. (Please note that you may have closed one or more queries or submitted one or more forms in time between report generation and your receipt of this email. The numbers below are not real-time.)

Thank you again for all you do in service to our study and your patients!

Site <<<site_id>>> By the Numbers
Report date: <<<date>>>
Screened : << screened>>
Failed : << failed>>
Randomized : << rand>>
SF Rate (Failed / Failed + Randomized) : << sfrate>>
Months Activated : << mons>>
Screened/Month : << srate>>
Screen Rate Country Rank : << srankc>>
Screen Rate Global Rank : << srankg>>
Randomized/Month : << rrate>>
Randomization Rate Country Rank : << rrankc>>
Randomization Rate Global Rank : << rrankg>>
Days Since Last Screening : << dsls>>
Days Since Last Randomization : << dslr>>
Open Queries : << oq>>
Queries Per Subj Screened : << qrate>>
Queries/Subject Country Rank : << qrankc>>
Queries/Subject Global Rank : << qrankg>>
Average Age of Open Queries : << avgqage>>
Age of Oldest Query (Days) : << oldestq>>
Query List : << qlist>>
Missing Pages : << mpgs>>
Missing Pages Per Subject Screened : << mrate>>
Missing Pgs Per Subject Country Rank : << mrankc>>
Missing Pgs Per Subject Global Rank : << mrankg>>
Average Age of Missing Pages (Days) : << avgmpgage>>
Age of Oldest Missing Page : << oldestmpg>>
Missing Page List : << mpgslist>>

Some final precautions

How often you provide a report like the one above, and what you include in it, are at your discretion. Fast-moving infectious disease trials may warrant a weekly report. Large, endpoint-drive cardiac studies may benefit from just one report per month. Also, carefully consider the cultural differences that exist among sites in various countries. There may be no acceptable way of communicating comparative metrics in some.

There’s power in your metadata. You should consult it frequently on your own, weekly if not daily. You can use the workbook above to do that and nothing more. But we have an obligation to patients worldwide to conduct trials in the most efficient manner compatible with the highest data quality. Bringing some gentle pressure to bear on sites is one method of achieving that. If you adopt some version of the practice described in this post, please let us know your experience with a comment or email.

Around the World in Three Data Integrations

Big data has been a recurring topic in medical research news for years now. It’s a topic that deserves our attention. Big data’s potential to revolutionize fields like genomics and to advance precision medicine generally is stunning. Today, though, a lot of the press is speculation. Robustly effective designer drugs for cancer, based on the patient’s genetic markers, remain an ideal that is likely decades away.

But if we adopt a broader conception of big data–one that includes the massive infrastructure supporting social media, the Internet of Things, and (potentially) interoperable health record platforms–real world applications are not hard to find.

March 2015: Researchers at Stanford University recruit 11,000 subjects into a cardiovascular study in 24 hours using Apple’s open source ResearchKit app.

September 2015: “By outfitting trial participants with wearables, companies are beginning to amass precise information and gather round-the-clock data in hopes of streamlining trials and better understanding whether a drug is working… So far, there are at least 299 such clinical trials using wearables, according to the National Institutes of Health’s records.” – Bloomberg News

July 2016: The National Cancer Institute introduces OpenGeneMed, “a portable, flexible and customizable informatics hub for the coordination of next-generation sequencing studies in support of precision medicine trials.”

One facet of these examples stands out. For all their diversity, projects that rely on big data rely just as much on collaboration.
Moving from genomes to biomarkers to disease risk models and personalized treatment requires more than one big dataset: it requires the integration of data from multiple systems that are secure, geographically separated, and disparately schematized.

Ten years ago, the ability to handle this task might have been seen as a leading-edge, if not commonly leveraged, feature of clinical technology. Today, software that cannot facilitate integration is doomed to obsolescence.

eCitizen of the Data World

What does this requirement mean for EDC? Simply put, those of us building data capture solutions need to look far beyond the “coordinator keying in vitals” use case. (Our solution for that use case had better already be rapid, reliable and easier to execute than ever, considering the burdens placed on trial sites in 2017.) With “insight by integration” at the forefront of research strategies, we technologists had better think of our system as a world traveler: one familiar with the laws in multiple countries, authorized to enter and leave those countries, and fully knowledgeable of their languages and customs. In the world of data management, this means the ability to pass authentication to enter a source database, map the data to a target, and leave the source while maintaining data provenance.

As a long-standing promoter of open, standards-based interoperability, OpenClinica represents this “world traveler.” The native language of OpenClinica’s EDC is the Clinical Data Interchange Standards Consortium Operational Data Model (CDISC ODM). This fact alone makes the OpenClinica data model an ideal cosmopolitan, instantly conversant with research peers around the globe. But holding fast to one standard is not sufficient. We need to be willing to learn new languages. By offering a well-documented web services API, OpenClinica makes it easy for its users to leverage RESTful web services, together with OAuth protocol version 2.0, to systematically:

  • extract data from almost any third-party source (e.g. labs and imaging centers),
  • associate each element of that data to the relevant Case Report Form (CRF) field.

APIs and authentication protocols offer the most direct route to turnkey integration. But it’s not enough to be powerful in the pursuit of data integration. A system has to be flexible, too, when tapping data sources that aren’t available to an API. For OpenClinica, this means providing a host of configurable tools to data managers and data entry personnel.

  • OpenClinica’s Jobs feature allows for custom imports from local files. A Job may be scheduled to run at any frequency, so that users responsible for data entry based on a regularly updated flat file (e.g. a CSV on their hard drive) may provide that data without keying in each element. A Job well-defined and set up just once improves accuracy and saves hours of research time.
  • An Import Data feature makes ad hoc batch uploads easy, as well. Users simply generate a XML file based on OpenClinica-supplied Object Identifiers (OIDs) to map data from the import file to the EDC.
  • OpenClinica supports a variety of Single Sign On (SSO) protocols, reducing repetitive authorization while maintaining security. OpenClinica is also an early and already experienced adopter of SMART on FHIR, a set of open specifications to integrate its core EDC with Electronic Medical Records (ER) and other health IT systems.

A Look at Our Passport

So far, I’ve outlined a set of capabilities required of any EDC in 2017, and claimed that OpenClinica meets them all. But where’s the evidence? In the second half of this post, I’m going put three of our partners in the spotlight. For each, OpenClinica was able to play a pivotal role in bringing together multiple data sources.

The Dutch Translational Research IT (TraIT) project, an initiative from the Center for Translational Molecular Medicine (CTMM),  “enables integration and querying of information across the four major domains of translational research: clinical, imaging, biobanking and experimental (any-omics).” While multiple systems power that integration, OpenClinica is the central hub. TraIT continues to host and support, having joined together 10 trials on the platform in October of 2011. By March of 2015, adoption had grown to include 852 users at 157 sites conducting 136 studies, and by October of 2016, that usage had grown to more than 2,800 researchers and 250 research projects.

Among the selection criteria used to evaluate and ultimately select OpenClinica as a partner, TraIT specifically cited:

  • “links to other data storage and analysis tools within the TraIT platform, allowing researchers to integrate and analyse case report data, imaging data, experimental data and bio banking information,” and…
  • the “possibility to integrate with Trusted Third Party which handles proper (de-)identification of participant data within OpenClinica and other tools/services used in TraIT.”

It is worth noting that, in addition to an infrastructure that allows database integration, TraIT relies equally on OpenClinica’s open source model to build custom integrations. “The advantage of the Open Source model compared to a proprietary model, is that multiple independent contributors can review the source code, making enhancements which are then added to the version available to the entire OpenClinica community.”

Usage by the broader community helps ensure the innovation’s longevity and continued evolution. TraIT leverages these tools (such as the OC Data Importer) to help their sites import vast quantities of data in bulk fashion, eliminating transcription errors and delays.

The 100,000 Genomes Project, led and funded by Genomics England, is another example of a large-scale effort to combine clinical and genomic data. The 100,000 Genomes Project is sequencing 100,000 genomes in order to:

  • better diagnose rare disease,
  • understand its causes, and
  • set a direction for research on effective treatment

Whole genome sequencing (WGS) offers the best hope for determining which genetic mutations give rise to particular phenotypes, including disease states. WGS yields the syntactical equivalent of the three billion nucleotide base pairs that make up just one strand of one individual’s DNA, so a research program involving even one such sequencing has already entered the territory of “big data.” While highly specialized systems are responsible for sequencing itself, and yet others for the analysis of the output, an equally essential tool for this research is a system that can manage the clinical data and biospecimen tracking of subjects visiting one of several geographically dispersed clinical centers. Here, too, OpenClinica serves as the hub. Researchers at 13 NHS Genomic Medical Centers are using OpenClinica to register participants, capture clinical information, and ensure that blood samples stay matched with their de-identified contributors.

Project leaders have made public a 10-page guide to researchers on this process, one whose brevity and clarity speaks to how easy OpenClinica makes it. Due the dedication of the researchers, collaboration of participants and the fitness of the technology, the project is on track for completion in 2017.

Click image to enlarge
Click image to enlarge

PECARN, the Pediatric Emergency Care Applied Research Network, is the first federally-funded pediatric emergency medicine research network in the United States. To date, PECARN has conducted 24 studies that have already changed how clinicians are preventing and managing acute illness and injury in children.

As part of their mission to advance clinical practice, PECARN has taken a lead role in the implementation and study of clinical decision support tools. For all the potential benefit offered by these tools, questions remain about their adoption and effectiveness. Do physicians and nurses generally follow evidence-based recommendations for treatment or diagnostic procedures? When they do, are outcomes improved?

To help answer these questions, PECARN study leaders conducted a nonrandomized trial with concurrent controls at thirteen emergency departments between November 2011 and June 2014. These thirteen departments were consolidated into ten research sites. At eight of these sites, clinicians creating an EHR record for any patient <18 years old with minor blunt head trauma were automatically presented with a custom template. This template solicited additional data about the injury before providing recommendations on CT use and risk estimates of clinically important traumatic brain injuries. (CT imaging of the brain is associated with a non-negligible risk of tumor formation in those who undergo the procedure, especially children. At the same time, early detection of ciTBI–i.e. injuries leading to death or requiring neurosurgery, intubation for more than 24 hours, or hospital admission for two or more nights–is critical for effective intervention. The recommendations provided by the EHR template were intended to limit CT use to those patients who met established predictive criteria for significant ciTBI risk.)

The clinicians work in their EHR, together with subsequent cranial imaging and TBI-related outcomes, all generated data that would require aggregation to determine (1) how frequently care providers heeded recommendations surrounding CT use, and (2) whether the predictive rules for ciTBI risk were valid. That aggregation fell to OpenClinica. By accepting reports generated by each site’s EHR to automatically create study subjects, and by integrating with the source of imaging data at each site, OpenClinica enabled a true e-source study that left clinical workflows unaffected. Not one of the 28,669 subjects created in the study database required manual data entry.

Click image to enlarge
Click image to enlarge







Images courtesy of Jeff Yearley, BA, Manager of Clinical Data Management, Data Coordinating Center, University of Utah. Click here to download the slides containing the images above.

The moral? Big data isn’t just found: it’s made, through the coordinated efforts of both people and systems that travel light and fast. You’re contributing to big data during more and more of your waking hours these days. If you want to help shape it through technology, get ready to cooperate… and pack your digital bags.