CDS for Non-Data Scientists Part 2: Resources for Bridging the Gap

59,000,000,000,000,000,000,000 bytes. That’s 59 zettabytes or 59 sextillion pieces of discrete information. Don’t feel bad; I had to look up those words too. That is the total amount of data estimated to have been generated through 11:59 pm on December 31, 2020, according to the International Data Corporation, since the start of the digital age.

Quantity Magnitude SI Prefix
Data on Planet Earth 1021 Zetta-
Rough Estimate of the Number of Sand Grains on Earth 1018 Exa-
Current Estimate of the Number of Stars in the Universe 1021 Zetta-
Estimate of the Drops of Water on Earth 1024 Yotta-

Assume for a moment that you were able to magically download all that data onto a single computer. It would take 3,277,777,778 of the largest currently commercially available hard drives (18 TB), each costing approximately $1,000 at this posting, for a grand total of well over 3 trillion dollars. Again, let’s say that, through science or magic, each HDD was the thickness of one of a US dollar bill; they would form a stack 223 miles thick. One more figure because I am having WAY too much fun [OpenClinica asked a data nerd to write about data, so they should have known what they wrought]. In roughly a century (give or take a decade), humanity is expected to have generated more data than atoms on Earth.
Fake Government NASA Doc

In a recently leaked document, NASA has expressed interest in converting Mars into the largest data center in the solar system.

With all this data, it’s hardly a surprise that data science is one of the “it” careers and that more and more career paths need to be data fluent.

For several years now, I’ve held a top-down view of data science that might make me a lot less popular: learn data science, and I mean really learn it, and clinical data science will be a snap [mostly applying principles you already know; after all, to a computer, an int data type is pretty much an int whether it’s a count of the number of TVs sold at a store or a heart rate], the inverse is rarely true. I once had a colleague who is a statistician (as in real, Ph.D. holding, with more Greek symbols on his whiteboard than English type of statistician) who convinced me of a similar rationale with regards to statistics vs. biostatistics.

What makes a good data scientist?

If you are reading this blog, especially at 10:30 at night or while you munch away on lunch, you probably already have some of the qualities of a data scientist! Do you have an innate, some might argue pathological, desire to understand how things work? Are you infamous for your attention to detail? Obviously, success in data science is a bit more complicated and nuanced. Still, at the heart of it, all data science is driven by a desire to use information to improve decision-making. No knee-jerk decisions or gut feelings are allowed here. Those devils and angels on your shoulders can stay home too. So if you fit into these criteria, read on because you may have just found your calling. If you don’t, read on anyway.

How do I break into the data science field?

This is a path that I just took, perhaps the second or third step on, so please, please don’t treat this as an exhaustive stay inside the lines, type article. Your path is yours alone. I can only offer some guidance and helpful hints that I’ve found along the way.

The first question you really need to grapple with is how much you want to get into data science. That isn’t meant to be derisive or anything of the sort. You’ll learn, if you haven’t already, that virtually everything in life has a cost. Is that super-specialized Ph.D. program worth the 5-7 years of work and time away from the workforce, not to mention the late nights staring at the computer screen? Maybe it is, and if so, go for it, but you still need to ask yourself if it’s worth it to you.

In the broadest sense, entering or advancing in the data science field takes the form of formal vs. informal training. Traditionally, formal training takes the form of an advanced degree of some type, while the informal is far more self-driven and created by you. Although neither is inherently better than the other, they both have some positives and some drawbacks, as you’ll soon see.

Back to school: is formal higher education suitable for you?

Theoretically, you can still enter the data science field with a bachelor’s degree and strong math, science, and programming background; however, those days seem to be limited. As more and more institutions have started offering advanced degrees in data science, the expectation that a serious candidate should have post-bachelors training has increased. Therefore, if you decide to invest the time, money, and effort into a graduate degree, you should know a few things first.

First, before you set foot in a data science class at a major university, you are usually expected to have completed:

  • three semesters of calculus,
  • one semester of linear algebra,
  • freshman computer programming,
  • and possibly differential equations and/or upper level statistics.

Not only that but you’re expected to remember them. I know, I was shocked too. So if you don’t have those classes on your transcript or you don’t recall how to calculate the multiplicative inverse of a matrix, you should probably brush up on those topics. More on that later.

This is my second attempt at a graduate program, so some advice for those considering a research-based degree. First, you should plan on spending roughly 4 hours every week studying and preparing for every 1 hour of coursework on your schedule. Sometimes this can be more, rarely less, depending on the specific course and your background. Then if you have teaching duties, plan on another 15 or so hours per week teaching and grading. Oh, then you have research and writing you’re expected to do, so budget another 15 hours for that. Oh, I almost forgot lab group meetings and any administrative responsibilities you might have. To train graduate students how to run a lab, faculty often delegate to their students. Finally, as you continue on in any program, you will usually have mentoring and leadership roles to take up your few remaining hours of freedom. Can you somehow sleep a negative number of hours??

A word to all the “Gentlemen C’s” out there, know that you are generally expected to maintain a 3.0 GPA in any graduate program.

Now that I’ve gotten most of the bad stuff out of the way, now the good. If you like those kinds of environments and have a deep desire to learn, you may well have the time of your life. You’ll often be working on the cutting edge of science and technology and working with people who literally wrote the book in their subject. There is also an increasing number of online degree programs and ones designed specifically for working professionals.

Notice a couple paragraphs earlier I mentioned research-based degrees. Usually, you can tell it’s researched-based by the degree’s letters: M.S. and Ph.D. On the other hand, professional degrees are more likely to be MPH, MDS, or any number of other acronyms. Professional degrees will usually be more practically based and, perhaps more importantly to you, may require less in the way of prerequisites. There is always a tradeoff though, professional degrees typically have requirements for work experience, usually 3-5 years. For instance, the degree I’m currently pursuing (a professional degree) doesn’t require any specific courses on your transcript BUT you are still held accountable for those skills and knowledge as if you had just taken those prerequisites I listed earlier.

A few words about online graduate degrees

Yes, there are a few high-quality online degrees out there, but it is often buyer-beware. Some things you want to look for are:

  1. Are they attached to a traditional, physical university? And if so, do they offer an approximate equivalent “on-campus” program? This is always a plus because such universities must maintain a certain standard or risk losing their accreditation. There is also a greater likelihood that you will be dealing with quality faculty.
  2. Will your degree or transcript say “Online” on them? Honestly, this is a very subjective issue. For me, concern 1 is far more critical than this. Still, if you’re going to put in 2-5 years of work on a master’s degree, you don’t want to risk coming out the other side with it being perceived as worth less than a “traditional” program.

Eh…got anything else???


One step below in intensity from an official degree are boot camps, webinars, certificates, and professional development programs. These can be very practical options, allowing you to demonstrate your talent with a type of real-life legitimacy that formal degrees don’t inherently impart.

Lastly, just taking time to read and learn about a given subject can be hugely beneficial. Perhaps you don’t need to know every nook and cranny of data science all at once. Maybe you can start by asking a single question or picking a specific topic. For example, “How do I use SQL to program a database?” or “What is this ‘GitHub’ thing I keep seeing in my Google results?” Then you can expand your knowledge from there.

If you consider that the true pioneers of any field had to discover it and make it up as they went, there’s no reason why we couldn’t learn the same way.

Some (but far from all) US Specific Programs

Traditional Degrees (MS, PhD, or both)

  • University of Arkansas Medical School [on campus certificate, masters & phd in biomedical informatics; including a specialization specifically in clinical research informatics]
  • University of Colorado Boulder [on campus MS-DS]
  • Indiana University Bloomington [several on-campus masters & ph.d programs in bioinformatics and data science]

Online Degree Programs

  • University of Colorado Boulder [online MS-DS]
  • Indiana University Bloomington [online MS in Data Science]
  • University of New Hampshire [online MS in Health Data Science]

Other Educational Oportunities

  • SpringBoard [online data science bootcamp]
  • University of Colorado Clinical Data Science Specialization [Certificate on Coursera]

Some Resources that I like

Addendum: My Data Science Pathway (a personal story)

So, now that we’ve covered all that, I’d like to give you a peek into my journey so far to becoming a data scientist. Keep in mind, we could be polar opposites for all I know, so you may favor a different path, and I certainly have no patent on this method, so feel free to rip it off 100%.

First, it’s been a while since I had any type of formal schooling. I left my original graduate program in 2012 and haven’t had much exposure since then. So I knew I needed to brush up on my background knowledge. Running the occasional ANOVA test since then hardly demonstrates everything you need to know. To refresh my math & statistics chops, I chose Though there are plenty of similar websites, I picked the one that matches my learning style and budget. I generally try to do a lesson (they call them quizzes) per day.

I knew I also needed to improve my programming skills within data science. I feel pretty comfortable with C++, C#, and Java, for instance. Python, not so much. It’s just a question of exposure to the language for me, so I picked This site is a dependable resource because it deals specifically with data science applications. It costs admittedly a little more than I wanted to spend, but I found a nice 3-month free subscription. Check out GitHub Student Developer Pack, where all I needed was a valid school email address.

One last preparatory tip I’ll offer is to make a cheat sheet. No, not to actually cheat. Virtually every language and package (you can commonly have 10 or more active at any time in real-life applications) has slight variations between them. So I’ve found it really is impossible to keep them all straight at once. Case and point, perhaps you’re working with 2 different SQL databases at once, one programmed with PostgresSQL the other in T-SQL. There is just enough difference between those two to cause you headaches, so maintaining a 3-4 page quick reference can be a life-saver. I’ve found the ones from to be a solid starting point, but mine usually end up covered in post-it notes all the same.

I made good use of Kindle and YouTube with virtually every other topic to make sure I was solid. This leads me to another pearl of wisdom about graduate school: you can’t be too prepared. Transitioning from an undergraduate (or what you remember) to a graduate program is like suddenly being drafted into a professional sport. The level to which you are expected to perform is high, and the learning curve is steep.

For my actual master’s program, I chose one that launched only recently. The University of Colorado Boulder has an online version of their on-campus master’s degree that is a solid fit for what I want. I also have found I do better with shorter courses, and this program has academic terms of 2 months. To a good approximation, they take each 3-hour class in the on-campus degree and chop it into 3 pieces. It is also 100% asynchronous; hence as long as the coursework is completed by the deadline, I can watch the lectures and do the work whenever I can fit them in.

User Acceptance Testing (UAT): The One Test Every Data Manager Should Try to Fail

“Unless someone like you cares a whole awful lot, nothing is going to get better.
It’s not.”
 — Dr. Seuss

Years of college and training. Professional certification. Memorizing what seems like an entire periodic table of data management acronyms: CDISC, CDM, CRF, eCRF, EDC, SOP, UAT. Tests. More tests. Clinical data managers spend their careers ensuring the accuracy and integrity of clinical trial data. It’s a bit ironic, then, that perhaps the most important CDM test is one that we are supposed to fail.

User Acceptance Testing (UAT) is the process of testing CDM software. UAT is the last step along the path to a live study launch. It’s the 11:30 AM seminar speaker that is the only thing between you and lunch. The proximity of UAT to study launch is unfortunate. Our collective mindset at this stage screams, “Can’t we just get on with it?” The necessity of UAT, however, cannot be overstated. Done well, two weeks of UAT will save the clinical data manager months of headaches in post-collection data cleaning.

Breaking Bad: Why should we care about UAT?
The obvious answer is that we care about data accuracy and integrity. This answer is specious. Of course we care. This is why we will diligently (and manually) correct errors after the fact. If bugs, missing form logic, incorrect form logic are not caught until the end of the study, we will dive in and diligently correct hundreds of data points without hesitation.

The correct answer as to why we should care about UAT, therefore, is that UAT saves time. Breaking things before we start protects us from having to fix things down the road. We’re doing our future selves a favor.

An Ounce of Prevention (UAT Best Practices)
To reap the benefits of UAT, you need to take the time to develop a thorough testing plan. Yes, it’s cathartic to just start hacking away like they do on HGTV home renovations, but we are striving for a more targeted probing of the data platform. Poking, not smashing. We need a plan of attack that focuses on key areas of risk.

During UAT planning:

Don’t reinvent the wheel. It is possible to invest an unlimited amount of time in testing, so include in your scope of UAT areas the system that would not have been covered in documented validation testing carried out by the software vendor. Do you need to test that a date field only allows a date to be entered? Do you need to test the system to make sure all the data entered is included in an export? Items like this were probably already covered in earlier stages of testing carried out by the software vendor (e.g. performance qualification (PQ) testing). Rather, for your UAT scope you’ll want to consider adding test coverage for the custom things you have configured in the software platform. For example, the following types of questions should be addressed in UAT:

  • Are user permissions set correctly?
  • Do forms collect the right study data?
  • Are data validations functioning correctly (e.g., what happens if we enter a BMI of 236 instead of 23.6?)?
  • Are calculated fields showing the right data?
  • Do form logic and rules you have defined work as intended?

Define the risks. You can further enhance the effectiveness of your UAT by identifying which parts of the study are most critical. For example, which fields, logic, and workflows support safety? Which support primary endpoints? What data show inclusion/exclusion compliance? Be sure to define robusts tests for these areas in your UAT plan.

Identify the Users. It’s great that you’re willing to get your own hands dirty, but the “U” in UAT isn’t texting shorthand for “you”. It’s “user”, which could be you but is more likely a different member of the research team. We need to find these users, then bribe (er…reward) them for participating. Create a friendly competition, or individual bounties for identifying flaws and errors. And more importantly, make sure your new “users” have been given a demo of the functionality of the system and understand the workflow itself. Far too often, UAT “findings” are raised by “users” using the system sight unseen. It simply equates to erroneous findings to sift through later. Finally, if you failed to include someone from Stats in your design period (tisk-tisk design best practice oversight alert), you better not leave them out here as well. Stats feedback during UAT is crucial to making sure they have a seat at the table to help solidify the expected output. If Stats, or more specifically, a Stats programmer dealing with study data output can give their approval here, you’ve got a friend for life!

Document Testing Results. Create a bug tracking form or error reporting tool. At minimum, we want the user to report:

  • where they encountered the bug, spelling error, missing data options, etc. (“Where” as in Form, Field name)
  • what did they expect to happen (“What” exact steps did they take, what exact data did they enter into the field)
  • what actually happened (“What” error message did they receive, what rule should have fired but failed, what data options did they expect to see that are MIA)
  • what priority applies to this finding? (Think in terms of high, medium and low. High is the “show-stopper” stuff that says there is no way you can “Go Live” unless this is resolved. Picture a flashing Stop sign. Whereas Low, may be something that applies to a later visit and would not impact your FPFV. Why is this distinction important? Oftentimes UAT is carried out in crunch-mode (we know, understatement of the year!) but the ability to negotiate which of the findings must be addressed for FPFV vs. those that can be addressed slightly after as a post go-live fix may mean the difference in hitting or missing your very important milestone.

To encourage as much feedback as possible give your testers a simple, and straightforward way to log results. You don’t need a fancy ticketing system. A simple spreadsheet can work great.

For example:

4. Get User Feedback. This step is separate from bug tracking. Your users just spent days kicking the tires on your shiny new system. You’re missing a golden opportunity if you don’t separately ask about their overall user experience. And while you may capture some aspects of the experience through surveys, you’ll likely get more useful feedback through personal interviews. Here are some questions you might include in such interviews:

  • What do you find easiest/most intuitive?
  • What do you find challenging or confusing?
  • What would be something you’d recommend we improve?

If we can be of any help with UAT or any additional needs, let us know. Thank you.

Clinical Data Science for Non-Data Scientists (Part 1)

Einstein is often attributed to saying, “If I had an hour to save the world, I would spend 55 minutes defining the problem.” In that spirit, we begin by defining what we mean by clinical data science, at least for the purposes of this blog series. CDS is the application of data science to medical and therapeutic decision-making, including research applications. Simple, right? So far, so good?

Well, now comes the hard part: defining “data science” itself as even seasoned professionals can be inconsistent with terminology. Is a given example best characterized as data science, or data analytics, or perhaps business intelligence? Do you need a data visualization expert? Perhaps we should just give everything to a statistician and hold our breath? These distinctions are more than purely academic, as finding the right person with the right skillset can seriously impact your outcome.

Before going any further, it is worth noting that for historical reasons, as much as anything, many of these job titles overlap, and many who perform them could have almost identical skillsets. The definitions and distinctions offered here may be one of many frameworks possible, but they seem to represent a plurality and perhaps the early signs of a convergent system. To help unpack these concepts, let’s examine them through the lens of when a given question would likely be addressed.

  • A statistician typically looks at data through a hindsight lens. They will generally try to answer questions about what happened and frame it in relation to attempting to pinpoint a true value.
  • A data visualization expert creates dashboards and similar tools to help decision-makers interpret data faster and easier.
  • A business intelligence expert will try to pull information from multiple sources, combine it with a knowledge of business operations, and make decisions largely based on the business’s own interests.
  • A data analyst will typically analyze past data, often with some statistical inference techniques, to make predictions and decisions about future events.
  • A data scientist uses real-time, or near real-time, data to make decisions through largely algorithmic techniques.

As you can see, the difference between the definitions of data analyst and data scientist is subtle, but the key to understanding the difference is that a data scientist’s approach will tend to be much faster to adapt than a data analyst. In essence, the data analyst may be examing trends a month, a quarter, a year, or more old. This is not to say that such analyses are not useful – one of the key business metrics is a year-over-year performance analysis which is exclusively in the purview of a posteriori analysis.

The advent and evolution of clinical data science as a discipline includes some exciting possibilities for clinical trials. A greater emphasis on adaptive trials and real-world evidence can increase the speed of trials and the validity of results. In fact, as data shows in some diseases, especially those with a strong genetic component (such as oncology or auto-immune diseases), an adaptive clinical trial can often decrease time, costs, and risks to both subjects and sponsor alike (Pallmann, Philip. Adaptive Designs in Clinical Trials: Why Use Them, and How to Run and Report Them. 2018). Umbrella, basket, and platform trial designs have all seen an increase in the past decade as:

  1. knowledge of molecular genetics has increased,
  2. the cost and difficulty of molecular techniques have decreased,
  3. and the complexity of interim analysis has decreased.

A knowledgable clinical data scientist could even conceivably program a majority of interim analyses to run repeatedly (using an appropriate correction for multiplicity issues) which would decrease downtime that can hinder adaptive designs. This approach could also end trials that have become unproductive or unnecessary earlier than in a traditional approach.

Almost all parties at all levels in clinical trials can benefit from the use of data science. Industry sponsors could see the most direct benefits in both cost and time reduction. Traditionally a large pharmaceutical company may value data science mostly to achieve time savings. In contrast, smaller start-ups and midsize companies might see value primarily on the side of cost savings. A sponsor can reduce the resources required to bring a product to market by applying a more adaptive design to trials that can support a greater frequency of interim analyses without the traditional overhead of scheduling a database freeze, completely resolving queries, and having a small army of statisticians and programmers spend weeks only to then have to present the data to a DSMB or similar body.

A clinical data scientist could theoretically preprogram all the necessary analysis and utilize any number of machine learning techniques to mitigate many unforeseeable circumstances, such as missing or erroneous data, outliers, or noncompliance. Machine learning, coupled with Natural Language Processing and search engine spiders, a.k.a. web crawlers, could conceivably enable sites and sponsors to monitor web forums, messages within EHR systems, and many more systems for SAEs or underreported adverse events. Similarly, a clinical data scientist could use emerging technologies to gather and process data to a degree that even 5 years ago would have seemed impossible. While technology continues to evolve at a substantial pace, it seems likely that humans will always be part of the process; therefore, it is unlikely that overhead can ever be reduced to zero, but data science can greatly reduce it.

CROs and SMOs tend to take a more business intelligence approach to clinical data science. It makes sense that if site A sees 5x more lung cancer patients than site B, then it should reasonably follow, all things being equal, that site A should get the phase III trial for a novel targeted lung therapy. This is an oversimplified example, but it illustrates the point that data can and should inform business decisions wherever possible. If a CRO notices a sharp uptick in the number of queries for a given site, they might arrange for retraining at the next monitor visit to increase compliance.

Regulatory agencies and quality assurance departments can utilize data science to increase the effectiveness of risk-based monitoring programs and the distribution of routine audits to increase efficiency, which can increase the effectiveness of these programs. The US FDA’s BIMO (Bioresearch Monitoring) program already uses some algorithmic approaches to determine where to send inspectors. The next logical evolution would likely be to incorporate machine learning to make the algorithm more competent. Perhaps it would incorporate Natural Language Processing to see trends in FDA 3500 forms (for safety reporting) or even social media groups discussing research experiences.

Research sites, too, can utilize data science methods to increase their own efficiency. Let’s be frank; we’ve all at least heard of sites that will apply to join every study, even remotely applicable to their practice. It’s understandable – in the purely academic realm, the term of art is “publish or perish,” and if you don’t stay in the black, then you won’t be a site for very long. This shotgun approach to research participation can have unintended consequences, however. Even something as simple as completing a study application form can take hours, contracts can take days, and study start-up can take weeks. All this time is, as they say, money. A data-driven approach has the potential to guard against this tendency.

DIY Study Build at 2X the Speed

OpenClinica Study Designer

Do you wish you could make the leap to build your own study within an EDC solution? Perhaps you have relied on CROs or external developers and want more control, increased speed, or better cost effectiveness. Many of our customers come to OpenClinica specifically for these reasons.

Most electronic data capture systems are not designed for the average data manager to build studies. These systems often require a high level of technical expertise from specialized clinical programmers. On the other hand, systems that data managers may find easy to use are often over simplified, lacking the necessary capability and functionality to run a trial effectively. OpenClinica gives data managers the best of both worlds: exceptional capabilities and the ability for today’s clinical data managers to deploy them in their studies.

Let’s start by taking a look at some of the key ways the OpenClinica Study Designer delivers on this promise.

Building Study Events (e.g. Visits)

Events are the building blocks of your study’s architecture. This is where your protocol foundation is visualized, eCRFs are placed, and ultimately your study calendar is set. These building blocks can’t be set in concrete however. They must allow for flexibility during adaptive trials, during early phase setup, and any potential mid-study changes or protocol amendments down the road.

The following video shows how easy it is to create and modify study events:


Seamlessly engage patients

The Covid-19 pandemic underscores the need to be able to carry out trials more remotely, and as a result researchers are relying more heavily on ePRO, eCOA, and eConsent. These capabilities can be seamlessly added within the OpenClinica Study Designer. In fact, 90% of our clients are incorporating this digital capability within their studies now. 

Select a single checkbox to enable an ePRO form.

Create ultra-smart eCRFs

The electronic case report form (eCRF) is the heartbeat of your study. OpenClinica eCRFs are incredibly capable and smart. Never ask the user for a value the system doesn’t already know (or can’t deduce). Real time edit checks and dynamic logic create a tactile and engaging experience. User forgot to hit the save button? No worries, OpenClinica forms automatically save your data. Whether a simple quality of life questionnaire or forms with edit checks, skip logic, and complex calculations, OpenClinica forms can have a tremendous impact on your study’s overall productivity.

Check out our Ultimate eCRF Design Guide to see some OpenClinica forms in action. All the extraordinary things these forms can do can be set-up by a clinical data manager (you don’t need to be a programmer). 

Stay tuned for a future post on the OpenClinica Form Designer and Form Library. 

Collaborate with stakeholders

The pace of building a study is heavily impacted by the ability to collaborate and iterate with your team. With OpenClinica, you can avoid the countless emails and inefficient meetings which can frustrate your team and inflate the timeline unnecessarily.

OpenClinica’s Study Designer enables real-time and asynchronous review. Invite a colleague to review a form and post comments. See updates to the study design without having to refresh your screen. Use labels and checklists to track the review and approval process and keep everyone on the same page. Increase your velocity.

One-click publish

From the Study Designer, you only need a single click to publish your study to OpenClinica’s test or production RTE (run time environment).

What you don’t see, running behind the scenes, is all of the automation, API calls, and modules associated with getting all your events, forms, definitions ready for go live. This is another example of powerful capabilities wrapped within a friendly, easy to use interface.

Publishing to your dedicated test environment, your team can perform all the training and testing required until you are ready to one-click publish to production. And don’t stress when the all too certain protocol amendments occur. You can make your study modifications quickly, whether event-based or form-based in the study designer. Rest assured, OpenClinica has fail-safe mechanisms for preserving study and data integrity.

Your stakeholders will be impressed at how you can turn around changes in hours, not not weeks.

No more long build times that bleed into study execution

No more advanced technical demands on data managers

No more wasted time in reviewing and publishing your study

See efficiency gains live – get a demo now.

Supporting RECOVERY for Covid-19

The largest randomized clinical trial for #Covid-19 treatments is making great strides in adaptively testing the treatment options for the UK and the entire world population. We are proud to support the efforts of Oxford University’s Nuffield Department of Population Health, and Nuffield Department of Medicine, with capturing the critical trial data and enabling the flexibility of the adaptive platform method.

The trial is being conducted in over 170 NHS hospital sites across the UK with over 10,000 participants to date. Patients who have been admitted to hospital with COVID-19 are invited to take part. The Chief Medical Officers of England, Wales, Scotland and Northern Ireland, and the NHS Medical Director are actively encouraging enrollment in this study and other major studies in the UK.

The trial is being coordinated by researchers at the University of Oxford, led by Peter Horby, Professor of Emerging Infectious Diseases and Global Health in the Nuffield Department of Medicine and Martin Landray, Professor of Medicine and Epidemiology at the Nuffield Department of Population Health.

Trial Treatments

  • Lopinavir-Ritonavir (commonly used to treat HIV)
  • Low-dose Dexamethasone (a type of steroid, which is used in a range of conditions typically to reduce inflammation).
  • Hydroxychloroquine (related to an anti-malarial drug)
  • Azithromycin (a commonly used antibiotic)
  • Tocilizumab (an anti-inflammatory treatment given by injection)

Trial Design

RECOVERY has an “adaptive” design, with data analyzed on a rolling basis so that any beneficial treatments can be identified as soon as possible and ineffective ones dropped from the trial.

“One of the biggest challenges with trial setup was developing the randomization capabilities, which we built in-house,” said Andrew King, Head of Trials Programming. There are several arms that are included in this study with the need for flexibility to enable quick treatment additions and modifications. There is also a second randomization for tocilizumab and standard of care. The randomization data is easily ingested into the randomization forms within OpenClinica, through the API.

“Building the study and the forms within OpenClinica was very straightforward and quick,” according to David Murray, Senior Trials Programmer. Site staff enter the trial data into the forms, primarily at the 28 day mark. “Another challenge operationally was training the over 4,000 site personnel on the randomization protocols and the EDC system. We found the training for OpenClinica’s EDC to be inconsequential in the process,” said David Murray.

For regular news, updates, and findings, be sure to visit the RECOVERY site. We continue to support many organizations in their efforts to research and ultimately discover effective and safe vaccines and therapies.

Feel free to contact us with any question on how we can help with your clinical studies.


OpenClinica’s Response to the Coronavirus

At OpenClinica, we are working hard to ensure our users, especially those working on COVID-19 projects, are fully supported with the tools, service, and performance you’ve come to depend on from OpenClinica.  Our team has adjusted seamlessly to a remote-work model and we are fully operational. We have in place business continuity procedures that all our staff are trained on yearly. These procedures ensure that we can continue to provide you with the service you require in a number of disaster recovery situations, including pandemics, and include procedures for remote work. We are following the guidance of the CDC as well as state and local public health officials and emphasizing that our employees do the same.

We consider it a privilege to be supporting multiple, highly consequential COVID-19 related research projects. Some of these have already come online and others will in the coming weeks/months. At the same time, we recognize the stress that the pandemic environment is placing on other research studies. Our products include capabilities to help transition many research activities that formerly were done in-person to fully electronic and remote activities. We are helping customers use tools like OpenClinica Participate to increase participant engagement via mobile and web communication, in many cases from recruitment all the way through study completion. More and more are adding source document upload fields to their studies to support remote monitoring. OpenClinica Insight is proving to be a highly useful tool to review clinical data and operational metrics in real time, and to deliver automated notifications and alerts where and when they’re needed most.

We are committed to ensuring the health and safety of our customers and staff and to making sure our products and services are fully available to you. Our solutions are continuing to be delivered with the same levels of uptime, performance, and support that you have come to expect from OpenClinica. Our love and gratitude go out to the true heroes, all the healthcare providers on the front lines of this crisis.

We value your trust and are committed to continuing to exceed your expectations. Please reach out to us if you have any questions or if there is anything you need. We are ready to help!


To counter bias, counterbalance

wooden scale with dice in background
On a scale from 1 to 10, with 10 representing utmost importance, how important is a healthy diet to you?

Do you have your answer? If so, I’ve got another question.

Which appeals to you more, a cozy bed or a luxurious meal?

Yeah, me too.

These are hardly clinical questions, but as far as survey items go, they’re instructive examples. The wording for each is clear. The response options are distinct. The structures, a scale and a multiple choice, are familiar. But if we want valid answers to these questions, we’ve got some work to do.

When designing a survey, it’s easy to overlook the effects its format could have on the responses. But those effects are potent sources of bias. In the example above, the first question primes you to think about diet and health. In doing so, it stacks the deck against the “luxurious meal” response in the following question. But the trouble doesn’t end there. Although “bed” and “meal” make for a short list, one of them appears before the other. The primacy effect–the tendency of a respondent to choose the first in a list of possible responses, regardless of question content–puts “luxurious meal” at a further disadvantage.

The good news is that surveyors (and data managers) have tools to mitigate these biases. Modern EDC allows you to systematically vary both question and response option order, either by randomly selecting from a set of all possible permutations, or rotating through a block of permutations one participant at a time. The practice, called counterbalancing, guards against unwanted order effects.

But it isn’t a cure all. Consider the practice of rotating through all permutations of your response options. No matter how a set of response options are ordered, one of them has to be placed first. The primacy effect, then, isn’t so much as diminished as it is distributed among all the response options. To illustrate, suppose we ask the two questions above in alternating order to 1,000 respondents, all of whom responded. In the end, you may discover that 82% of the “bed or meal” respondents chose “bed,” while only 16% of the “meal or bed” respondents chose “bed.” Results like these ought to make you suspicious. If there’s no reason to believe the two cohorts differ (apart from the phrasing of the question posed to them), it’s premature to conclude that the population is split almost evenly along their preferences. The majority of the respondents selected whichever option they encountered first, so it’s much more likely that you’ve confirmed the power of the primacy bias.

The same caveat applies to question order. Imagine that our example survey always posed the “bed or meal” question before the “healthy diet” questions. Regardless of how the respondent answers the first questions, she’s now in a state of mind that could influence her next response. (“Ooh, I love luxurious meals. I guess a healthy diet isn’t that important to me,” or “I need better sleep more than I need a rich entree. I guess I healthy diet is important to me.”) To counterbalance, we might alternate the order in which these questions appear. Still, priming may occur in both orderings.

So how do we know if order effects have influenced our results? (Perhaps the better question is: how do we determine the degree to which order effects have influenced our results?) First, it’s important to know which variant of the survey each respondent answered, where variant refers to a unique order of questions and response options. Our example survey comes in (or should come in) four variants:

  1. Rate the importance of diet, then choose between meal or bed
  2. Rate the importance of diet, then choose between bed or meal
  3. Choose meal or bed, then rate the importance of diet
  4. Choose bed or meal, then rate the importance of diet

All respondents, then, fall into exactly one of these four “variant cohorts.” Let’s assume further that these cohorts differ only in the survey variant they answered; that our experimenters randomly selected the respondents from the same target population, and administered variant 1 to respondent 1, variant 2 to respondent 2, and so on in a cycle.

If, when comparing these cohorts, we find their aggregate responses diverging significantly from one another, we should suspect that ordering effects have distorted our results. All things being equal, the greater the divergence, the more significant the impact of order effects. Our experimenters were careful in recruiting similar respondents, after all, so the profile of responses from any subset should more or less match the profile of responses from any other subset. If that’s not happening, something other than question content is at play.

Precisely quantifying the impact of order effects is the business of professional statisticians, a noble breed from which the present writer stands apart. But as data managers, we owe it to good science to understand the concepts at play and to stand vigilant against their influence. In the end, the truth may not be balanced. But our instruments for finding it should be.

Click the image below to experiment with a counterbalanced form

Page of web form showing a question and four possible responses

Spotlight on: combinatorics!

How many ways are there to order n distinct items? Let’s ask the Brady Bunch!

In the photo to above, Cindy stands at the top of the staircase. But it might just as well have been Greg, or Marcia, or even Alice. (She is practically family.) In fact, the director might have chosen any one of the 9 Bradys (or honorary Bradys) to take the top spot. So there are at least 9 ways to arrange this loveable clan. But once the top spot is claimed, we have 8 choices remaining for the next spot. Multiply 9 possibilities for the top spot by 8 possibilities for the second, and we discover that there are at least 72 ways to arrange this brood. But, much like reunion specials and spin-offs, the madness doesn’t end there. We now have to fill the third spot from the 7 remaining Bradys. Multiple the 72 combinations for spots 1 and 2 by the 7 possibilities for spot 3, and we’ve already hit 502 line-ups. Keep going, and you’ll discover that there are 362,880 ways to order one of America’s favorite families alongside one of America’s ugliest staircases.

Of course, you recognize the math here. It’s just 9 factorial. And while n-factorial grows pretty darn fast as n grows, these values pose little to no challenge for computing devices. OpenClinica happens to run on computing devices, so we have no problems with these values either. Combine that performance with our features for generating random numbers (or changing form attributes according to participant order or ID, or both), and you have all the tools you need to implement counterbalancing on any scale.

And that’s much more than a hunch.

Souvenirs from Baltimore (SCDM 2019)

Thank you to everyone who helped make SCDM 2019 another fantastic learning opportunity. We were delighted to catch up with old friends and make dozens of new ones. If you weren’t able to visit our booth, attend our product showcase, or catch our panel discussion on key performance indicators, don’t worry — we captured the insights for you. You can download articles, best practices, and more right from this page.

Register now for OC19: All Hands on Data

OC19 All Hands on Data

Register now!


This year, sail the seas of OC4 in Santander, Spain.

This year, it’s all about discovery and doing. We’ll spend our time together working directly in OC4: creating studies, building forms, and becoming familiar with the dozens of new features and enhancements that continue to make our current solution the solution data managers can rely on for performance, flexibility, and security.

Two days packed with 30- to 90-minute workshops on:

  • Multiple queries, multiple constraints, and item annotations
  • Hard edit checks
  • Moving from datamart to Insight
  • Insight for key performance indicators (KPIs)
  • The power of external lists
  • Collecting and safeguarding Protected Health Information (PHI)
  • OC4 APIs
  • Data imports
  • Single sign on
  • Conditional event scheduling
  • An early look at Form Designer
  • FAQ on OIDs
  • XPath functions every user should know
  • CDASH forms
  • Getting to SDTM

Want to take part in OC19 but can’t travel to Spain? Register and join us via webcast! (Super User Trainees must attend in person.)

All registrants will receive access to an OC4 sandbox study in advance of the conference.

Register now!