In the months ahead, Journal for Clinical Studies will publish a detailed guide to designing eligibility forms–a guide authored by OpenClinica! The complete contents are embargoed until they appear (for free) on the journal’s website. As soon as it’s published, we’ll provide a link to it here. In the meantime, here’s a brief excerpt and an interactive form illustrating one of the guide’s four core principles, “Make your forms carry out the logic.”
from The Four Criteria of a Perfect Eligibility Form: A Success Guide, forthcoming in Journal for Clinical Studies
Think a moment about the human brain. Specifically, think about its capacity to carry out any logical deduction without flaw, time and again, against a background of distractions, and even urgent medical issues.
It doesn’t have the best track record.
Even the most logical research coordinator could benefit from an aid that parses all of the and’s, or’s, and not’s scattered throughout your study’s eligibility. A good form serves as that aid. Consider the following inclusion criteria, taken from a protocol published on clinicaltrials.gov.
Inclusion criteria #1 is straightforward enough. (Although even there, two criteria are compounded into one.) By contrast, there are countless ways of meeting, or missing, criteria #2. It’s easy to imagine a busy CRC mistaking some combination of metformin dose and A1C level as qualifying, when in fact it isn’t.
But computing devices don’t make these sorts of errors. All the software needs from a data manager is the right logical expression (e.g., Criteria #2 is met if and only if A and B are both true, OR C and D are both true, etc.) Once that’s in place, the CRC can depend on your form to deliver perfect judgment every time. Best of all, that statement can live under the surface of your form. All the CRC needs to do is provide the input that corresponds to A, B, C, and D. The form then chops the logic instantly, invisibly, and flawlessly.
Test drive the form below to see a smart eligibility form in action. OpenClinica customers, be sure to visit the Eligibility section of the CRF library to download the form definition.
For more on designing forms that capture better data, faster, view our on-demand webinars from December 2018.
What’s the upper limit of normal for luteinizing hormone in men? How about the lower limit of normal for women? Questions like that probably have you thinking, “check the table.” Tables are an intuitive and powerful way to match variable input like gender and age with a corresponding value. The same tried-and-true data structure can associate the key terms of an adverse event with standard categories and codes. OC4 makes it easy to tie tables to your forms. Let us show you how! Join us for a quick 30-minute demo on Tuesday, 4/23 at 10am Eastern. We’ll examine two use cases from the form-filling and form-building side: one on applying lab normal ranges, the other on retrieving term codes and classifications from the CTCAE. We’ll show you how this method generalizes to other uses, too. So whether you’re coding and collecting specimens or not, don’t miss your chance to learn how to put a classic tool to new use.
If the question is difficult to ask, it’s even harder to answer. Ask an actuary. Calculating life expectancy is a complex matter; more complex, at least, then plugging your date of birth and today’s date into a function. An informative life expectancy depends on a host of additional factors, like your sex, current health, and lifestyle habits.
“Multifactorial” calculations like the one above dominate medicine, so it’s no surprise that they should dominate clinical research, too. Take a plasma urea level of 39 mg/dL. Is that above, below, or within the normal range? The question is misconceived, because normal in this case is relative to patient age. A 30-year-old’s “slightly above normal” is a sixty-year-old’s “slightly below normal.”
Age is only one factor. For many ranges, patient gender, ethnicity, and co-morbidities, in addition to age, determine a normal range. Often, researchers can set these factors aside without raising undue safety concerns or undermining the generalizability of their results. But as personalized medicine continues to inform drug discovery and clinical care, researchers will turn to more finely-grained reference data more often. For this reason, data management systems must make it easy for these researchers to apply reference data that’s sensitive to as many factors as they choose.
Of course “easy”, just like “slightly below normal”, is a relative term–for the most part. In no context is a writing a lengthy formula of nested “if, then” clauses easy, e.g.
If the participant is male and Hispanic and between 18 and 25 years old and the test is for ALT, then set the lower limit to 12 U/L and the upper limit to 102 U/L, and if the participant is male and Hispanic and between 26 and 34 years old and the test is for ALT, then set the lower limit to…
Completing the formula above would mean assigning a lower and upper bound to every combination of gender, ethnicity, and age range. The process could easily take hours, just to set the normal limits of ALT. If the study involved a dozen analytes, the data manager would need to devote the better part of a week to programming these constraints. If, at a later date, any one of those constraints changed, he or she would face the unenviable task of modifying (without breaking) the original formula. Too many “modern” EDC systems force the data manager to soldier through this error-prone task. With paper, it’s a non-starter.
How much better, then–for efficiency and quality–to rely on a general constraint; one that leverages a tool that’s easy to build, easy to read, and easy to amend? I’m talking about the humble table.
Yes, the table. For all our advancements in data architecture, the same grid that set us on the path to multiplication in second grade remains an asset today. It’s human readable, it’s intuitive, and it’s powerful.
Powerful? Really? How much can you accomplish with just two axes?
Great question! It’s true that most spreadsheet applications don’t offer more than two axes, at least not through their GUI. But who needs them when you have thousands of rows and hundreds of columns at your disposal?
Suppose I need to assign a unique value to every combination of three hand preferences (left, right, or ambidextrous), four eye colors (blue, green, brown, or hazel), and the eight blood types (O,O-,A+,A-,B+,B-,AB+,AB-). At first blush, it seems a table won’t suffice. I have more dimensions (three) than I do axes (two). But a single axis can accommodate any number of dimensions, because nothing prevents me from treating each combination of values on those dimensions as its own, n-factored value. For example, I can treat each triad of handedness, eye color, and blood type as one of 96 phenotypes.
Laying these combinations along a vertical axis, I can assign a value to each with just two columns.
Maybe I’m partial to a more compact format. If so, I can combine the variables from two dimensions to specify one axis, and let the variables from the third dimension define the other:
Here I make the 96 assignments with 13 rows and 9 columns. (The virtue of this method is fewer total cells.)
In any case, I’m free to work with as many factors as the situation demands, and distribute them between the two axes in any way that makes the most sense to me. Leaning on a familiar format, I’ve made the difficult part of a multifactorial reference much easier. All that remains is to add to the form a simple instruction for “looking up” the values needed. Even if those values change, the form doesn’t need to.
Fair enough. But won’t real use cases require gargantuan tables?
A mere 972 rows (plus one header) accommodates every combination of age, ethnic and racial category, and gender. 80 columns (plus one on the left for analyte names) accommodates the 40 lower and 40 upper limits. The resulting 973 x 81 grid is small potatoes for database applications that power software like OpenClinica’s. Simple formulas in that context can retrieve the value from any coordinate within milliseconds.
Great. But what’s the big deal? I hardly ever need to apply reference data for this many factors at once.
Yes, a heart rate is a heart rate, and while population differences might exist for this measure, they’re hardly a concern on your vitals form. But don’t confuse the frequency of a need with its importance. Take safety. An insignificant drop in a lab value for one patient may portend real danger for another. Even apart from lab interpretation, though, tables can drive efficiency and accuracy. Dosing can vary between countries participating in the same study, due to differences in labeling and regulation. The same goes for eligibility and arm allocation. Whenever we try to account for these variables within our form, we accept programming delays and chances for error that we don’t need to accept. It is possible, of course, to make an error when assembling our table, but those errors are easier to spot and correct within a grid than they are in some extended, conditional formula. The tables themselves are easier to build in the first place, too, as their source data usually comes to us in the form of a spreadsheet. A little re-labeling of our first row and column, some testing, and viola: trusted references values are now a part of our study.
The lesson is simple, then. First, make sure you’re using the right EDC. Your form builder should allow you to specify reference data with tables, and your forms themselves should retrieve values in that table based on user input all but instantly. Second, use your two axes to their full potential: fill those rows and columns with as many dimensions as are relevant by tapping some basic combinatorics. Third, congratulate yourself.
You’ve just used a bit of the time you have left more wisely.
Real-world example: applying lab reference data that’s gender- and age-specific for two analytes
Not every analyte carries with it age- or gender-specific normal ranges. But for those that do, their differences are critical. In this example, I’m concerned with two levels from a blood serum panel: Insulin-like growth factor 1 (IGF-1) and Dehydroepiandrosterone-sulfate (DHEA-S). Both play a key role in several endocrinological disorders, and both have normal ranges that vary by age and gender.
Our example form first asks the user to specify the patient’s sex, patient’s date of birth, and date of sample collection. The form then calculates the patient’s age, in years, at the time of collection.
Next, the user is prompted to enter the value for IGF-1.
As soon as it’s entered, the form compares that value to the upper and lower limits of normal corresponding to the patient’s age and sex, as found on the table below. Note that the user’s selection for gender, together with the calculated age, combine to form a unique key (‘female40’).
The lower limit of normal (igf_ll) for a 40-year-old female is 106 ng/mL. The upper limit (igf_ul) is 267 ng/L. Because the entered value of 145 falls within that range, no query is raised.
The form then prompts the user to enter a DHEA-S level. For this analyte, the user enters 278 ug/DL. That value is outside the range for a 40-year-old female. As a result, an auto-query instantly fires.
The full reference table includes 191 rows…
1 header row
95 rows for men aged 18 to 112
95 rows for women aged 118 to 112
… and 5 columns…
1 column for the gender-age combinations
1 column for IGF-1 lower limit
1 column for IGF-1 upper limit
1 column for DHEA-S upper limit
1 column for DHEA-S lower limit
Introducing racial and ethnicity categories, along with more analytes, would multiply the area of our table. Six racial and ethnic categories combined with two genders and 95 whole-year ages would generate a total of 1,141 rows (6 x 2 x 95 combinations plus 1 header row). Specifying the upper and lower limits for three dozen analyzes would occupy 73 columns (2 limits x 36 analytes + 1 label column). The resulting 1,141 x 73 table would contain 197,393 cells, a total that’s 206 times greater than our original table’s cell count. Should you expect a proportional decrease your form’s response time? Not at all! The “lookup” still happens within milliseconds.
Here in Massachusetts, with the March winds whipping and snow always a threat, a week’s vacation down south is common fantasy. Even if it means a 10-hour car ride, most of us relish the thought.
But suppose our usual set of wheels, a Mini Cooper, say, is in the shop. (Potholes the size of craters are a common reality here.) Instead of foregoing our vacation, we decide to rent a vehicle. Chances are another Mini Cooper won’t rank as our first choice. Sure, a car that size could get us from Boston to the Outer Banks. But at what cost to our comfort and cargo?
We can think of study designs as kinds of road trips, and our eClinical tools as vehicles. Randomized controlled trials (RCTs) and registry studies are only two such journeys, but they’re two of the most frequent we in the research community take. In both cases, most of us rely on electronic data capture (EDC) to help us reach our destination.
How do we choose the EDC “vehicle” that will get us there safely, with minimal delays? Marquee brand names matter less than road-tested features. Consider the relative importance of these EDC features in RCTs versus registries.
Automatic reporting and notification
Important, especially as interim analyses approach
Very important, to maintain desired balance among subgroup sizes and to ensure that sites contact participants at the appropriate intervals
Important, especially for trials that need to consume a high volume of lab and imaging data on a regular basis
Very important, as EHR data can easily account for more than half of a registry data
Very important, to drive data entry timelines, reduce queries, and ensure quality
Critically important, for the reasons listed under RCTs, as well as to minimize collection burden and complement the flow of clinical care
Often irrelevant, otherwise critically important, depending on whether patient-reported outcomes (PRO) are collected
Often critically important, as PRO is a far more common data source for registries
Let’s look briefly at each four of these features in turn.
Automatic reporting and notification
Registries may be observational, but make no mistake: there’s still plenty to do, especially when it comes to ensuring the internal and external validity of the study design. As with RCTs, registries begin that task before the first participant is ever enrolled. Inclusion and exclusion criteria define the patient population from which the study will draw. Enrollment targets and duration parameters are set to deliver the necessary statistical power. Data elements are selected ahead of time, as are relevant outcomes.
But RCTs wield two defenses against bias that registries do not: highly specific eligibility criteria, and randomization itself. The first defense minimizes the role confounding factors can play, while second helps ensure that the influence of confounders is balanced between comparison groups. Registries, on the other hand, because of their greater need to reflect the diversity of the real-world, cast “a wider net” with their eligibility criteria. In doing so, the room for selection bias–and confounder impact–grows. And because oversampled patient types are not randomized to one or more groups in a registry, they can distort findings more powerfully.
The registry data manager, then, is often engaged in a constant battle against selection bias. She has no more powerful weapon than real-time reporting, which can signal when enrollment efforts need to be retargeted.
Typically, criteria for registry enrollment aren’t as selective as they are for RCTs. That kind of wiggle room leaves the door open for selection bias. Regular, visual reporting of subgroup counts (e.g. patients of a certain race, ethnicity, sex, age, or socioeconomic status) are indispensable to maintaining a registry population that is representative of the general population with the disease, exposure, or treatment under study.
That same real-time reporting, directed now at the site, can automatically prompt CRCs to contact participants in a longitudinal study at the right intervals. Why is this important? Missed visits mean missing data, which poses two risks. The first is a failure to collect enough overall data points to achieve the desired statistical power. The second, more subtle risk pertains to whom the missing data belongs. If a certain patient subgroup is disproportionately more likely to miss visits (and therefore leave blank spaces in the final dataset), results become biased toward the subgroups who were compliant with visit schedules.
Missing data is the scourge of registries. Without consistent outreach to all participants from sites, the data collected can easily be skewed by those participants who are proactive in keeping their appointments. Give your sites helpful, regular reminders of upcoming milestones for their participants.
The takeaway? Look for a data management system that allows you to build clear, actionable reports, and to push them out automatically to sites and other stakeholders on a schedule you set.
The life sciences are awash with data, and yet how little of it flows smoothly from tank to tank. My blood type, and yours, is very likely recorded in a database somewhere. Yet, if either of us participates in a study where that blood type is a variable, we are almost certainly looking at a new finger prick.
The situation is poor enough for RCTs, but becomes dire with registries. Registries that don’t easily consume extant secondary data place increased burden on site staff, who are rarely reimbursed well or at all for their contribution. RCTs, on the other hand, often pay per assessment. Also unlike RCTs, registries make more frequent use of this data:
Clearly, the ability to exchange data among multiple sources in a programmatic way (i.e. interoperability) is a must have for the EDC that will power your registry. Of course, unlike data storage capacity, you can’t quantify interoperability with just a number and a unit of measure. Interoperability is a technical trait that depends on more fundamental attributes:
Data standards – Does the system “speak” an open, globally recognized language, such as CDISC?
API services – Does the system offer clear, well-documented processes for accepting (and mapping) data that is pushed to it from external sources?
Security – Will data that enter, leave, and reside within the system remain encrypted at all times?
Before selecting an EDC, press your prospective vendors on the questions above. Then inquire exactly how they’ll ensure safe and reliable integration between their system and all your data sources.
Contributing to clinical research is, for many, its own reward. The prospect of expanding our medical knowledge and, perhaps, improving patient lives, is a powerful incentive. But it’s easy for a clinician or researcher to lose sight of these ideals in the middle of a hectic workday. When the research is long and unpaid, which is more likely to be the case for a registry than an RCT, the will to “get the work done” can quickly trump the will to do it right.
Leaders of registry operations, therefore, have an even greater responsibility than their RCT peers to keep hurdles low. That’s a wide-ranging obligation, but ensuring a frustration-free data capture experience stands at or near its center.
First, a clinical research coordinator (CRC) should meet with no obstacle the tasks of signing in to their EDC and navigating to the right participant. These are the “low bars.” Even so, they can easily trip up thick-client systems, and even web-based systems that aren’t built for performance or designed with UX (user experience) principles always front of mind.
But the most important ease-of-use tests happen in the context of the case report form (eCRF). Recall that a large portion of registry data comes from clinical encounters that occur in the delivery of standard care. Think pulse oximetry, or resting heart rate. Consequently, any eCRF that can’t be completed while in the exam room ought to have you raising an eyebrow. Accept nothing less than forms that render clearly in any browser, on any device (no matter how it’s held). But that’s not all. Fields on the form need to be “smart:” appearing only when they are relevant; capable of showing specific, real-time messages when the entered value is invalid; and hanging on to input even if an internet connection is lost. Finally, these fields should “remember” and calculate for the CRC, instantly pulling in patient data from visits ago to reference in the current form, and effortlessly turning a height and weight into a BMI.
Can’t pull medical history from the EHR? Help your CRC out with fast and responsive autocomplete fields.
In short, contributing to your registry should go hand in hand with delivering excellent patient care and keeping accurate, up-to-date records. The further those drift apart, the more your registry suffers.
What endpoints are to RCTs, outcomes are to registries. And where there’s a concern with outcomes, there is (often) a concern with patient self-reports. Ergo, chances are high that your next registry may rely on patient-reported outcomes (PRO) as one of its data sources.
If we need to keep the barriers to data submission low for researchers, we need to keep them all but invisible to participants–whileensuring data quality. The simple paper form may appear to offer this balance. Historically, it may have done just that. But twenty years of Internet use have changed our expectations when it comes to offering personal information. Without sacrificing one bit (or byte) of security, we want the same ease in reporting aches to a physician as we find in booking a flight. We want instant “help” when we don’t understand a question, and we don’t want to be asked about matters that don’t apply to us.
Given the expectations above, a study that utilizes even a single PRO instrument can benefit from make the conversion to ePRO. Real-time edit checks, for example, re-orient the participant when their input conflicts with field requirements, without risking the influence of a human interpreter. The time and cost of transcription disappears.
When PRO takes the form of a patient diary, paper’s dirty secrets truly come into the light. Provided the paper form isn’t lost or damaged in the first place, it’s virtually impossible to tell whether a patient made daily diary entries as instructed, or retrospectively wrote responses just prior to a study visit, raising data quality concerns.
As a field, we’ve embraced ePRO for the last decade. But too many ePRO solutions don’t offer the ease or convenience they should. Many depend on provisioned devices, difficult to use and prone to malfunction. Web-based ePRO technologies are a step in the right direction. Here, too, though, industry efforts to deliver a effortless experience often fall short. Special software (such a smartphone apps) require storage space, not to mention the know-how and patience for download, installation, and activation. Along with everything else participants need to remember, is it really fair–or feasible–to add a password, browser recommendations, and “virtual check-in times” to the list?
Won’t be getting you your data anytime soon
The answer lies in allowing patients to use their own devices, be it a laptop or smartphone, and to submit their data on the browser with which they’re most comfortable. Form URLs specially encoded for each participant make passwords unnecessary, while auto-scheduled email and SMS messages provide a friendly, “just-in-time” reminder to make their report. And what better way to convey a message of collaboration with the participant than eConsent? While its role in risky, interventional trials may still be unclear, eConsent is tailor made for registries: it can deliver an interactive education on the purpose of the study, ensure comprehension with in-form quizzes, and signal to registry leaders real-time recruitment trends.
As for ePRO data collection itself, layout, question order, and response mechanism can all make the difference between valid, timely data and no data at all. The participant isn’t an amateur researcher, and won’t tolerate the kinds of screens all of us envision when we think of EMRs. Data collection should proceed from the simple to the complex, leveraging skip logic to trigger only those questions that are relevant, and using autocomplete to help with terminology. A single column layout, a conspicuous progress bar and page advance button, autosave–all of these features are crucial to treating patients like the study VIPs that they are.
Chances are you’ve already set personal goals for the new year. But have you set professional ones? If not, let me suggest the most meaningful data management resolution you can make for 2019.
“I will build better forms.”
Of all the aspects of eClinical, why rally around forms? For us, the answer is simple. Of all the tools in your toolbelt, optimized forms offer you the greatest leverage in capturing clean data promptly.
Just think about it. You don’t have control over the buzz of clinical and research activity at your sites. You don’t have control over source documents. And you can’t personally visit all your sites, train all your CRCs, or SDV all the items in your study.
So how do you bring order to the (mostly) controlled chaos of a clinical study? You encourage prompt entry of accurate data with forms that are smart, standardized, and, yes, even appealing. Think about what capable forms deliver at the point of entry and downstream:
Timely data entry from CRCs who are thrilled to use your beautiful eCRFs
More accurate data, thanks to specific, real-time edit check messages
Less missing data, thanks to sensible skip logic and clear instructions
Reduced SDV burden, as more and more of your clean, flexible forms become the source
Reduced time to database lock
Easier analysis, thanks to sophisticated “in form” scoring and calculations
Smoother submission, with CDISC-standardized exports
Don’t get us wrong. Tools that expedite study design and user management, fast and reliable system performance, rock-solid security – these are crucial too. But forms are where you, your CRCs, and your data live, day in and day out. So in terms of overall study success, the “ROI” on perfecting your forms is hard to beat.
That’s why we’ll never take our eyes off this so-called fundamental. In fact, we devoted the last few months of 2018 to assembling the best thinking on forms. Not just our thinking, but yours, and that of experts. You can see what we’ve been up to by reading our blog series on cross-form logic or streaming our two December webinars. And we hope you’ll let us know what (in addition to better forms, of course) will change the clinical research landscape this year. Take the poll below!
Equipped with the right system, data managers today have more tools than ever before to capture high-quality right at its source. But what can the “right system” do? And how should data managers deploy those capabilities to prompt accurate, efficient entry from site staff and participants?
We hosted two webinars this month to answer those questions. Now you can watch them on demand. In Kitchen Sink, you’ll spend an economical 30 minutes understanding how OpenClinica’s form capabilities – from cross-form intelligence to modern, multi-media question types – all work together to serve as the user’s partner in capturing better data, faster. In Good Form, we step back to understand the proper role of these capabilities (not all scales are Likerts!) and climb inside the heads of CRCs and participants to better craft our forms for these study VIP’s.
Click either image below to sign into our on-demand webinar library. Then watch, share, and respond with comments to add your expertise to the conversation.
In just thirty minutes, explore all of OpenClinica’s form capabilities, each doing its part to ensure better data, faster! See how skip logic, autocomplete, clickable image maps, real-time edit checks, autosave and a LOT more all work together on beautiful UX to drive cleaner data from the start.
In the examples given here and here, our cross-form logic depended on data with a known location. In the first case, we knew exactly which event, form, and item to turn to in order to retrieve participate sex and date of birth. In the second case, each of our event dates marked the start of a unique, one-time event, so “finding their address” within the database was a straightforward process.
But what happens when we need to reference data with an indeterminate location, supposing that it even exists? In these cases, we may need to walk around a remote neighborhood, comparing building shapes and sizes, before we find what we’re looking for.
Consider a study that requires drug cessation if a certain adverse event recurs within 90 days. For an Alzheimer’s study, that adverse event may be detection of ARIA (amyloid-related imaging abnormalities) on an MRI scan. Suppose that a second presentation of ARIA within 90 days of the first means that the participant must discontinue study drug. What better occasion could there be for “checking the records” than while reporting a new ARIA? Checking the records here means:
retrieving the start dates of any previous AE whose report indicated ARIA
calculating the days’ difference between the most recent of the dates above with the new ARIA presentation date
showing an alert if that difference is less than 91 days
It’s hardly complex, but a busy CRC working with dozens of participants in multiple trials may forget to follow the process. Cross-form logic, on the other hand, never forgets. The screenshots below depict the result of a third AE report for a single participant. Of the two previous reports, the first indicated detection of ARIA on 1-Nov-2018. Because the newest ARIA, on 24-Jan-2019, falls within 90 days of the prior one, the form displays instructions to discontinue study drug.
Few questions are too complex for cross-form logic to answer and act upon. If you can state a rule in logical or mathematical terms, you can most likely implement it using a straightforward expression, no matter how many other forms you need to reference. The OpenDataKit library of XPath functions offers a wealth of tools you can combine to create smart, versatile forms that collaborate with researchers. So don’t let your innovation stop with drug development or study design: carry it through to your forms!
In the previous post, we presented a cross-form example of clinical data collected in one event factoring into the normal lab range for a subsequent event. But clinical data aren’t the only factors that drive decisions. When an event occurred may determine when it should happen next. Dosing visits provide a common example. Depending on the protocol, dosing might occur at precise intervals (e.g. exactly 21 days between doses) or within windows (e.g. at least 7 days and no more than 10 days from the previous dose). Your EDC system should be able to enforce either type of scheduling, by reading not only the dates entered into forms, but dates found in form and event metadata.
In the example illustrated below, the form makes calculations between the start of a current event (“Dosing Visit 2”) and the start of the previous visit (“Dosing Visit 1”). According to this imaginary protocol, no fewer than 7 and no more than 10 days may elapse between these two visits.
If dosing visit 2 occurs within this range, the form guides the site-based user on how to prepare the dose.
If dosing visit 2 has a start date fewer than 7 days after dosing visit 1, the form displays instructions not to proceed, and provides the earliest and latest start dates for the visit.
Finally, if dosing visit 2 has a start date greater than 10 days after dosing visit 1, the form displays instructions to submit a protocol deviation note.
All of these calculations and feedback take place instantaneously.