CDS for Non-Data Scientists Part 2: Resources for Bridging the Gap

59,000,000,000,000,000,000,000 bytes. That’s 59 zettabytes or 59 sextillion pieces of discrete information. Don’t feel bad; I had to look up those words too. That is the total amount of data estimated to have been generated through 11:59 pm on December 31, 2020, according to the International Data Corporation, since the start of the digital age.

Quantity Magnitude SI Prefix
Data on Planet Earth 1021 Zetta-
Rough Estimate of the Number of Sand Grains on Earth 1018 Exa-
Current Estimate of the Number of Stars in the Universe 1021 Zetta-
Estimate of the Drops of Water on Earth 1024 Yotta-

Assume for a moment that you were able to magically download all that data onto a single computer. It would take 3,277,777,778 of the largest currently commercially available hard drives (18 TB), each costing approximately $1,000 at this posting, for a grand total of well over 3 trillion dollars. Again, let’s say that, through science or magic, each HDD was the thickness of one of a US dollar bill; they would form a stack 223 miles thick. One more figure because I am having WAY too much fun [OpenClinica asked a data nerd to write about data, so they should have known what they wrought]. In roughly a century (give or take a decade), humanity is expected to have generated more data than atoms on Earth.
Fake Government NASA Doc

In a recently leaked document, NASA has expressed interest in converting Mars into the largest data center in the solar system.

With all this data, it’s hardly a surprise that data science is one of the “it” careers and that more and more career paths need to be data fluent.

For several years now, I’ve held a top-down view of data science that might make me a lot less popular: learn data science, and I mean really learn it, and clinical data science will be a snap [mostly applying principles you already know; after all, to a computer, an int data type is pretty much an int whether it’s a count of the number of TVs sold at a store or a heart rate], the inverse is rarely true. I once had a colleague who is a statistician (as in real, Ph.D. holding, with more Greek symbols on his whiteboard than English type of statistician) who convinced me of a similar rationale with regards to statistics vs. biostatistics.

What makes a good data scientist?

If you are reading this blog, especially at 10:30 at night or while you munch away on lunch, you probably already have some of the qualities of a data scientist! Do you have an innate, some might argue pathological, desire to understand how things work? Are you infamous for your attention to detail? Obviously, success in data science is a bit more complicated and nuanced. Still, at the heart of it, all data science is driven by a desire to use information to improve decision-making. No knee-jerk decisions or gut feelings are allowed here. Those devils and angels on your shoulders can stay home too. So if you fit into these criteria, read on because you may have just found your calling. If you don’t, read on anyway.

How do I break into the data science field?

This is a path that I just took, perhaps the second or third step on, so please, please don’t treat this as an exhaustive stay inside the lines, type article. Your path is yours alone. I can only offer some guidance and helpful hints that I’ve found along the way.

The first question you really need to grapple with is how much you want to get into data science. That isn’t meant to be derisive or anything of the sort. You’ll learn, if you haven’t already, that virtually everything in life has a cost. Is that super-specialized Ph.D. program worth the 5-7 years of work and time away from the workforce, not to mention the late nights staring at the computer screen? Maybe it is, and if so, go for it, but you still need to ask yourself if it’s worth it to you.

In the broadest sense, entering or advancing in the data science field takes the form of formal vs. informal training. Traditionally, formal training takes the form of an advanced degree of some type, while the informal is far more self-driven and created by you. Although neither is inherently better than the other, they both have some positives and some drawbacks, as you’ll soon see.

Back to school: is formal higher education suitable for you?

Theoretically, you can still enter the data science field with a bachelor’s degree and strong math, science, and programming background; however, those days seem to be limited. As more and more institutions have started offering advanced degrees in data science, the expectation that a serious candidate should have post-bachelors training has increased. Therefore, if you decide to invest the time, money, and effort into a graduate degree, you should know a few things first.

First, before you set foot in a data science class at a major university, you are usually expected to have completed:

  • three semesters of calculus,
  • one semester of linear algebra,
  • freshman computer programming,
  • and possibly differential equations and/or upper level statistics.

Not only that but you’re expected to remember them. I know, I was shocked too. So if you don’t have those classes on your transcript or you don’t recall how to calculate the multiplicative inverse of a matrix, you should probably brush up on those topics. More on that later.

This is my second attempt at a graduate program, so some advice for those considering a research-based degree. First, you should plan on spending roughly 4 hours every week studying and preparing for every 1 hour of coursework on your schedule. Sometimes this can be more, rarely less, depending on the specific course and your background. Then if you have teaching duties, plan on another 15 or so hours per week teaching and grading. Oh, then you have research and writing you’re expected to do, so budget another 15 hours for that. Oh, I almost forgot lab group meetings and any administrative responsibilities you might have. To train graduate students how to run a lab, faculty often delegate to their students. Finally, as you continue on in any program, you will usually have mentoring and leadership roles to take up your few remaining hours of freedom. Can you somehow sleep a negative number of hours??

A word to all the “Gentlemen C’s” out there, know that you are generally expected to maintain a 3.0 GPA in any graduate program.

Now that I’ve gotten most of the bad stuff out of the way, now the good. If you like those kinds of environments and have a deep desire to learn, you may well have the time of your life. You’ll often be working on the cutting edge of science and technology and working with people who literally wrote the book in their subject. There is also an increasing number of online degree programs and ones designed specifically for working professionals.

Notice a couple paragraphs earlier I mentioned research-based degrees. Usually, you can tell it’s researched-based by the degree’s letters: M.S. and Ph.D. On the other hand, professional degrees are more likely to be MPH, MDS, or any number of other acronyms. Professional degrees will usually be more practically based and, perhaps more importantly to you, may require less in the way of prerequisites. There is always a tradeoff though, professional degrees typically have requirements for work experience, usually 3-5 years. For instance, the degree I’m currently pursuing (a professional degree) doesn’t require any specific courses on your transcript BUT you are still held accountable for those skills and knowledge as if you had just taken those prerequisites I listed earlier.

A few words about online graduate degrees

Yes, there are a few high-quality online degrees out there, but it is often buyer-beware. Some things you want to look for are:

  1. Are they attached to a traditional, physical university? And if so, do they offer an approximate equivalent “on-campus” program? This is always a plus because such universities must maintain a certain standard or risk losing their accreditation. There is also a greater likelihood that you will be dealing with quality faculty.
  2. Will your degree or transcript say “Online” on them? Honestly, this is a very subjective issue. For me, concern 1 is far more critical than this. Still, if you’re going to put in 2-5 years of work on a master’s degree, you don’t want to risk coming out the other side with it being perceived as worth less than a “traditional” program.

Eh…got anything else???

YES!!!

One step below in intensity from an official degree are boot camps, webinars, certificates, and professional development programs. These can be very practical options, allowing you to demonstrate your talent with a type of real-life legitimacy that formal degrees don’t inherently impart.

Lastly, just taking time to read and learn about a given subject can be hugely beneficial. Perhaps you don’t need to know every nook and cranny of data science all at once. Maybe you can start by asking a single question or picking a specific topic. For example, “How do I use SQL to program a database?” or “What is this ‘GitHub’ thing I keep seeing in my Google results?” Then you can expand your knowledge from there.

If you consider that the true pioneers of any field had to discover it and make it up as they went, there’s no reason why we couldn’t learn the same way.

Some (but far from all) US Specific Programs

Traditional Degrees (MS, PhD, or both)

  • University of Arkansas Medical School [on campus certificate, masters & phd in biomedical informatics; including a specialization specifically in clinical research informatics]
  • University of Colorado Boulder [on campus MS-DS]
  • Indiana University Bloomington [several on-campus masters & ph.d programs in bioinformatics and data science]

Online Degree Programs

  • University of Colorado Boulder [online MS-DS]
  • Indiana University Bloomington [online MS in Data Science]
  • University of New Hampshire [online MS in Health Data Science]

Other Educational Oportunities

  • SpringBoard [online data science bootcamp]
  • University of Colorado Clinical Data Science Specialization [Certificate on Coursera]

Some Resources that I like

Addendum: My Data Science Pathway (a personal story)

So, now that we’ve covered all that, I’d like to give you a peek into my journey so far to becoming a data scientist. Keep in mind, we could be polar opposites for all I know, so you may favor a different path, and I certainly have no patent on this method, so feel free to rip it off 100%.

First, it’s been a while since I had any type of formal schooling. I left my original graduate program in 2012 and haven’t had much exposure since then. So I knew I needed to brush up on my background knowledge. Running the occasional ANOVA test since then hardly demonstrates everything you need to know. To refresh my math & statistics chops, I chose Brilliant.org. Though there are plenty of similar websites, I picked the one that matches my learning style and budget. I generally try to do a lesson (they call them quizzes) per day.

I knew I also needed to improve my programming skills within data science. I feel pretty comfortable with C++, C#, and Java, for instance. Python, not so much. It’s just a question of exposure to the language for me, so I picked DataCamp.com. This site is a dependable resource because it deals specifically with data science applications. It costs admittedly a little more than I wanted to spend, but I found a nice 3-month free subscription. Check out GitHub Student Developer Pack, where all I needed was a valid school email address.

One last preparatory tip I’ll offer is to make a cheat sheet. No, not to actually cheat. Virtually every language and package (you can commonly have 10 or more active at any time in real-life applications) has slight variations between them. So I’ve found it really is impossible to keep them all straight at once. Case and point, perhaps you’re working with 2 different SQL databases at once, one programmed with PostgresSQL the other in T-SQL. There is just enough difference between those two to cause you headaches, so maintaining a 3-4 page quick reference can be a life-saver. I’ve found the ones from www.quickstudy.com to be a solid starting point, but mine usually end up covered in post-it notes all the same.

I made good use of Kindle and YouTube with virtually every other topic to make sure I was solid. This leads me to another pearl of wisdom about graduate school: you can’t be too prepared. Transitioning from an undergraduate (or what you remember) to a graduate program is like suddenly being drafted into a professional sport. The level to which you are expected to perform is high, and the learning curve is steep.

For my actual master’s program, I chose one that launched only recently. The University of Colorado Boulder has an online version of their on-campus master’s degree that is a solid fit for what I want. I also have found I do better with shorter courses, and this program has academic terms of 2 months. To a good approximation, they take each 3-hour class in the on-campus degree and chop it into 3 pieces. It is also 100% asynchronous; hence as long as the coursework is completed by the deadline, I can watch the lectures and do the work whenever I can fit them in.

Clinical Data Science for Non-Data Scientists (Part 1)

Einstein is often attributed to saying, “If I had an hour to save the world, I would spend 55 minutes defining the problem.” In that spirit, we begin by defining what we mean by clinical data science, at least for the purposes of this blog series. CDS is the application of data science to medical and therapeutic decision-making, including research applications. Simple, right? So far, so good?

Well, now comes the hard part: defining “data science” itself as even seasoned professionals can be inconsistent with terminology. Is a given example best characterized as data science, or data analytics, or perhaps business intelligence? Do you need a data visualization expert? Perhaps we should just give everything to a statistician and hold our breath? These distinctions are more than purely academic, as finding the right person with the right skillset can seriously impact your outcome.

Before going any further, it is worth noting that for historical reasons, as much as anything, many of these job titles overlap, and many who perform them could have almost identical skillsets. The definitions and distinctions offered here may be one of many frameworks possible, but they seem to represent a plurality and perhaps the early signs of a convergent system. To help unpack these concepts, let’s examine them through the lens of when a given question would likely be addressed.

  • A statistician typically looks at data through a hindsight lens. They will generally try to answer questions about what happened and frame it in relation to attempting to pinpoint a true value.
  • A data visualization expert creates dashboards and similar tools to help decision-makers interpret data faster and easier.
  • A business intelligence expert will try to pull information from multiple sources, combine it with a knowledge of business operations, and make decisions largely based on the business’s own interests.
  • A data analyst will typically analyze past data, often with some statistical inference techniques, to make predictions and decisions about future events.
  • A data scientist uses real-time, or near real-time, data to make decisions through largely algorithmic techniques.

As you can see, the difference between the definitions of data analyst and data scientist is subtle, but the key to understanding the difference is that a data scientist’s approach will tend to be much faster to adapt than a data analyst. In essence, the data analyst may be examing trends a month, a quarter, a year, or more old. This is not to say that such analyses are not useful – one of the key business metrics is a year-over-year performance analysis which is exclusively in the purview of a posteriori analysis.

The advent and evolution of clinical data science as a discipline includes some exciting possibilities for clinical trials. A greater emphasis on adaptive trials and real-world evidence can increase the speed of trials and the validity of results. In fact, as data shows in some diseases, especially those with a strong genetic component (such as oncology or auto-immune diseases), an adaptive clinical trial can often decrease time, costs, and risks to both subjects and sponsor alike (Pallmann, Philip. Adaptive Designs in Clinical Trials: Why Use Them, and How to Run and Report Them. 2018). Umbrella, basket, and platform trial designs have all seen an increase in the past decade as:

  1. knowledge of molecular genetics has increased,
  2. the cost and difficulty of molecular techniques have decreased,
  3. and the complexity of interim analysis has decreased.

A knowledgable clinical data scientist could even conceivably program a majority of interim analyses to run repeatedly (using an appropriate correction for multiplicity issues) which would decrease downtime that can hinder adaptive designs. This approach could also end trials that have become unproductive or unnecessary earlier than in a traditional approach.

Almost all parties at all levels in clinical trials can benefit from the use of data science. Industry sponsors could see the most direct benefits in both cost and time reduction. Traditionally a large pharmaceutical company may value data science mostly to achieve time savings. In contrast, smaller start-ups and midsize companies might see value primarily on the side of cost savings. A sponsor can reduce the resources required to bring a product to market by applying a more adaptive design to trials that can support a greater frequency of interim analyses without the traditional overhead of scheduling a database freeze, completely resolving queries, and having a small army of statisticians and programmers spend weeks only to then have to present the data to a DSMB or similar body.

A clinical data scientist could theoretically preprogram all the necessary analysis and utilize any number of machine learning techniques to mitigate many unforeseeable circumstances, such as missing or erroneous data, outliers, or noncompliance. Machine learning, coupled with Natural Language Processing and search engine spiders, a.k.a. web crawlers, could conceivably enable sites and sponsors to monitor web forums, messages within EHR systems, and many more systems for SAEs or underreported adverse events. Similarly, a clinical data scientist could use emerging technologies to gather and process data to a degree that even 5 years ago would have seemed impossible. While technology continues to evolve at a substantial pace, it seems likely that humans will always be part of the process; therefore, it is unlikely that overhead can ever be reduced to zero, but data science can greatly reduce it.

CROs and SMOs tend to take a more business intelligence approach to clinical data science. It makes sense that if site A sees 5x more lung cancer patients than site B, then it should reasonably follow, all things being equal, that site A should get the phase III trial for a novel targeted lung therapy. This is an oversimplified example, but it illustrates the point that data can and should inform business decisions wherever possible. If a CRO notices a sharp uptick in the number of queries for a given site, they might arrange for retraining at the next monitor visit to increase compliance.

Regulatory agencies and quality assurance departments can utilize data science to increase the effectiveness of risk-based monitoring programs and the distribution of routine audits to increase efficiency, which can increase the effectiveness of these programs. The US FDA’s BIMO (Bioresearch Monitoring) program already uses some algorithmic approaches to determine where to send inspectors. The next logical evolution would likely be to incorporate machine learning to make the algorithm more competent. Perhaps it would incorporate Natural Language Processing to see trends in FDA 3500 forms (for safety reporting) or even social media groups discussing research experiences.

Research sites, too, can utilize data science methods to increase their own efficiency. Let’s be frank; we’ve all at least heard of sites that will apply to join every study, even remotely applicable to their practice. It’s understandable – in the purely academic realm, the term of art is “publish or perish,” and if you don’t stay in the black, then you won’t be a site for very long. This shotgun approach to research participation can have unintended consequences, however. Even something as simple as completing a study application form can take hours, contracts can take days, and study start-up can take weeks. All this time is, as they say, money. A data-driven approach has the potential to guard against this tendency.

Is Your Clinical Trial Software Effective, or Just Efficacious? (Part 2 of 2)

When it comes to your assessing your trial technology, your data managers, study coordinators, Investigators and senior leaders are all study subjects.

In the previous post, I described the difference between efficacy and effectiveness, an increasingly important concept in clinical research and healthcare. After stressing the importance of effectiveness research to health policy planning and patient decision-making, I summarized seven criteria for identifying effectiveness studies. Finally, I asked whether these criteria could be re-purposed beyond a medical intervention to inform how we measure the effectiveness of software systems used to conduct clinical trials.

Is it possible to assess clinical trial software through the lens of effectiveness, as opposed to just efficacy?

I believe that it’s not only possible, but crucial. Why? We all want to reduce the time and cost it takes to deliver safe, effective drugs to those that need them. But if we don’t scrutinize our tools for doing so, we risk letting the status quo impede our progress. When lives are on the line, we can’t afford to let any inefficiency stand.

In this post, I adapt the criteria for effectiveness studies in clinical research into a methodology for evaluating the effectiveness of clinical research software. I limit the scope of adaptation to electronic data capture (EDC) systems, but I suspect that a similar methodology could be developed for CTMS, IVR, eTMF and other complementary technologies. If I open a field of inquiry, or even just broaden one that exists, I’ll consider it time well spent.

Continue reading Is Your Clinical Trial Software Effective, or Just Efficacious? (Part 2 of 2)

The Forecast is Cloudy

GE recently announced it is moving its 9,000 supported applications to the cloud. Nowadays, all of us are bombarded with information about “the cloud”, and it can be hard to wade through the hype and hyperbole to understand the landscape in a way that helps us make decisions about our own organizations.

Enterprise cloud computing is a complex topic, and how you look at it depends on many variables. Below I try to outline one typical scenario. Your inputs, and the weight you give to different factors involved in making the decision will vary, but the general paradigm is useful across a wide variety of organizations.

In the interest of full disclosure, I am CEO of a company that sells cloud-based clinical research solutions (OpenClinica Enterprise, OpenClinica Participate). We adopted a cloud model after going through exercises similar to the ones below. Rather than reflecting bias, it demonstrates our belief that the cloud model offers the greatest combination of value for the greatest number of organizations in the clinical research market.

So… Let’s say you’re a small-to-medium size enterprise, usually defined as having under 1000 staff, and you are considering moving your eClinical software technologies to a public cloud and/or to a Software-as-a-Service (SaaS) provider.

Let’s start with the generic move of in-house (or co-located) servers and applications to public cloud environment. We’ll get to SaaS in a bit.

Economics

For this exercise, we’ll use the handy modelling tools from Intel’s thecloudcalculator.com. And we’ll assume you want to run mission-critical apps, with high levels of redundancy that eliminate single points of failure. We’ll compare setup of your own infrastructure using traditional virtualization to a similar one on cloud, based on certain assumptions:

The results for an internal, or “private” cloud are:

Economics

The public cloud looks as follows:

Economics2

Economics3

Source: http://thecloudcalculator.com

Wow. A 26x difference in cost. Looks pretty compelling, right? But not totally realistic – you’re probably not considering building a highly redundant in-house or co-located data center to host just a couple of apps. Either you already have one in place, or are already deploying everything to the cloud. In the latter case, you don’t need to read further.

In the former case, let’s explore the cost of adding several more applications to your existing infrastructure. What are the marginal costs of adding the same amount of computing capacity (12GB of memory, 164GB storage) on top of an existing infrastructure? We can use the same calculator to compute the delta between the total cost of a private cloud with 190GB of memory and 836GB of storage. But here it gets much trickier.

According to the calculator, our 190GB cloud costs $379,324 – the same as the 12GB cloud in the first example! Moreover, adding another 12GB of capacity pushes the cost up to $513,435, a difference of $134,111. However, if we change our assumptions and start with a 150GB cloud, then add 12GB of capacity, the marginal cost is $0.

What we’re seeing is how the IT overhead costs of running your own private cloud infrastructure tend to grow in a discrete, rather than continuous, manner, and the cost of going from one tier to the next is usually very expensive.

Our calculator makes a bunch of assumptions about the size of each server, and at what point you need to add more hardware, personnel, cooling, etc. The exact number where these thresholds lie will vary for each organization, and the numbers in the example above were picked specifically to illustrate the discrete nature of IT capacity. But the principle is correct.

Large cloud providers, on the other hand, mask the step-wise and sunk capital costs from customers by only charging for each incremental unit of computing actually in use. Because these providers operate at a huge scale, they are able to always ensure excess supply and they can amortize their fixed and step-wise costs over a large number of customers.

The examples above show that the actual costs of a public cloud deployment are likely to be significantly lower than those of building or adding to a comparable private cloud. While there’s no guarantee that your public cloud cost will be less than in-house or colocated, market prices for cloud computing continue to become more competitive as the industry scales up.

What is certain however, is that flexibility of the public cloud model eliminates the need for long-term IT capital budget planning and ensures that a project won’t be subject to delays due to hardware procurement pipelines or data center capacity issues. In most cases it can also reduce burden on IT personnel.

Qualitative Advantages

The central promise of the cloud is a fundamental difference in the ability to run at scale. You can deploy a world class, massively scaled infrastructure even for your first proof-of-concept without risking millions of dollars on equipment and personnel. When Amazon launched the S3 cloud service in 2006, its headline was “Amazon S3 enables any developer to leverage Amazon’s own benefits of massive scale with no up-front investment or performance compromises”.

It is a materially different approach to IT that enables tremendous flexibility, velocity, and transparency, without sacrificing reliability or scalability. As Lance Weaver, Chief Technology Officer for Cloud at GE Corporate identifies, “People will naturally gravitate to high value, frictionless services”. The global scale, pay as you go pricing models, and instantaneous elasticity offered by major public cloud providers is unlike anything in the technology field since the dawn of the Internet. If GE can’t match the speed, security, and flexibility of leading public cloud providers, how can you?

What You Give Up

At some level, when moving to the cloud you do give up a measure of direct control. Your company’s employees no longer have their hands directly on the raw iron powering your applications. However, the increased responsiveness, speed, and agility enabled by the cloud model gives you far more practical control that the largely theoretical advantages of such hands-on ownership. In a competitive world, we outsource generation of electrical power, banking, delivery of clean, potable water, and access to global communications networks like the Internet. Increasingly, arguments for the cloud look similar, with the added benefits of continuous, rapid improvements and falling prices.

Encryption technologies and local backup options make it possible to protect and archive your data in a way that gives you and your stakeholders increased peace-of-mind, so make sure these are incorporated into your strategy.

Risk Reduction

The model above is based on the broad economics of the cloud. However, there are other, more intangible requirements that must be met before a change can be made. You’ll want to carefully evaluate a solution to ensure it has the features you need and is fit for purpose, that the provider you choose gives you the transparency into the security, reliability, and quality of their infrastructure and processes. Make sure that data ownership and level of access is clear and meets your requirements. Ensure you have procedures and controls in place for security, change control, and transparency/compliance. These would be required controls for an in-house IT or private cloud as well. One benefit of public cloud providers in this area is that many of them offer capabilities that are certified or audited against recognized standards, such as ISO 27001, SSAE16, ISAE 3402, and even FISMA. Some will also sign HIPAA Business Associate Agreements (BAAs) as part of their service. Adherence to these standards may be part of the entry-level offering, though sometimes it is only available as part of a higher-end package. Be sure to research and select a solution that meets your needs.

External Factors

No matter who you are, you are beholden to other stakeholders in some way. Here are a couple areas to ensure you pay attention to:

  • Regulation – Related to risk reduction, you want to have controls in place that adhere to relevant policies and regulations. In clinical research, frameworks such as ICH Good Clinical Practice and their principles of Computer System Validation (CSV) are widely accepted, well understood, and contain nothing that is a barrier to deploying a well-designed cloud with the appropriate due diligence. You may also have to consider national health data regulations such as HIPAA or EU privacy protections. Consider if data is de-identified or not, and at what level, to map out the landscape of requirements you’ll have to deal with.
  • Data Storage – A given project or group may be told that the sponsor, payer, institution, or regulatory authority requires in-house or in-country storage of data. Sometimes this is explicitly part of a policy or guideline, but just as often it is more of a perceived requirement, because “that’s the way we’ve always done it”. If there is wiggle room, think about if it is worth fighting to be the exception (more and more often, the answer is yes). Gauge stakeholders such as your IT department, who nowadays are often overburdened and happy to “outsource” the next project, provided good controls and practices are in place.
  • Culture – a famous saying, attributed to management guru Peter Drucker, is that “Culture eats strategy for breakfast, every time”. Putting the necessary support in place for change in your organization and with external stakeholders is important. The embrace of cloud at GE and in the broader economy helps. Hopefully this article helps :-). And starting small (something inherently more possible with the cloud) can help you demonstrate value and convince others when it’s time to scale.

SaaS

SaaS (Software-as-a-Service) is closely tied to cloud, and often confused with it. It is inherently cloud-based but the provider manages the details all the way up to the level of the application. SaaS solutions are typically sold with little or no up-front costs and a monthly or yearly subscription based on usage or tier of service.

SaaS-IaaS-PaaS

Source: http://venturebeat.com/2011/11/14/cloud-iaas-paas-saas/

When you subscribe to a SaaS application, your solution provider handles the cloud stuff, and you get:

  • a URL
  • login credentials
  • the ability to do work right away

Which leads to a scenario like the following:

A few years ago, you typically had to balance this advantage (lack of IT headaches and delays) against the lack of a comprehensive feature set. As relatively new entrants to the market, SaaS platforms didn’t yet have all the coverage of legacy systems that had been around for years, or in some cases decades. However, the landscape has changed. The SaaS provider is focused on making their solution work great on just one, uniform environment, so they can focus more of their resources on rapidly building and deploying high-quality features and a high-quality user experience. The result is that there is far more parity. Most SaaS solutions have caught up and are outpacing legacy technologies in pace of improvements to user experience, reliability, and features. Legacy providers have to spend more and more resources dealing with a complex tangle of variations in technology stack, network configuration, and IT administration at each customer site.

 

Furthermore, the modern SaaS provider can reduce, rather than increase, vendor lock-in. Technology market forces demand that interoperability be designed into solutions from the ground up. Popular SaaS frameworks such as microservice APIs mean your data and content are likely to be far more accessible, both to users and other software systems, than when locked in a legacy relational database.

The SaaS provider has the ability to focus on solving the business problems of its customers, and increasingly-powerful cloud infrastructure and DevOps technologies to automate the rest in the background in a way that just works. These advantages get passed that along to the customer in continuous product improvements and the flexibility to scale up and down as you need to, without major capital commitments.

Conclusion

YMMV, but cloud & SaaS are powerful phenomena changing the way we live and work. In a competitive environment, they can help you move faster and lower costs, by making IT headaches and delays a thing of the past.

 

2015 Future of Open Source Survey Results

Open source software has emerged as the driving force of technology innovation, from cloud and big data to social media and mobile. The Future of Open Source Survey is an annual assessment of open source industry trends that drives broad industry discussion around key issues for new and established software-related organizations and the open source community.

The results from the 2015 Future of Open Source Survey reflect the increasing adoption of open source and highlight the abundance of organizations participating in the open source community. Open source continues to speed innovation, disrupt industries, and improve productivity; however, a reported lack of formal company policies and processes around its consumption points to a need for OSS management and security practices to catch up with this growth in investment and use.

Check out the slides below for survey results.

Reacting to #ResearchKit

OpenClinica_AppleRK Apple, Inc. has a remarkable ability to capture the world’s attention when announcing “the next big thing.” They have honed their well-known Reality Distortion Field skills for over 30 years. As the largest company in the world, and bellwether of the technology industry, Apple’s announcements are immediately recounted, opined, lionized, and criticized all across the Internet—sometimes with very limited real information on the new product itself. Of course, it helps to have their unmatched track record in actually delivering the next big thing.

ResearchKit has grabbed such attention. Maybe not as much as The Watch, but amongst the minority of us who pay attention to such things. And the reactions have been typically polarized—it’s either an “ethics quagmire” or “Apple fixing the world.”

But reality rarely presents an either-or proposition. I’ve written before on the need to use technology in simple, scalable ways to engage more participants in research and capture more data. Every form of engaging with patients and conducting research is fraught with potential for bias, bad data, and ethical dilemmas. Properly controlling these factors is difficult, and the current handling of these factors lead many to conclude that clinical research is overly “bloated and highly controlled”. There’s truth to that, but the fundamental need for good controls is real. As technology enables us to engage in new ways, how we implement such controls is likely to transform, perhaps unrecognizably so.

I don’t think Apple—or anybody—has these problems fully solved yet. And I expect we’re going to a see a vigorous debate in coming years between #bigdata and #cleandata that I hope will lead us to more of both. But ResearchKit, or at least the announcement thereof, is a game changer. Whether or not ResearchKit in its present form becomes a widely adopted platform, the impact was felt overnight: “11,000 iPhone owners signed up for a heart health study using Apple’s newly-announced ResearchKit in the first 24 hours… To get 10,000 people enrolled in a medical study normally, it would take a year and 50 medical centers around the country”. ResearchKit builds on momentum towards patient-centricity established in the last five years within pharma, NIH, online patient communities, mHealth, and health care, and uses Apple’s consumer clout to bring it to the attention of the average person on the street.

So let’s break down what we know about ResearchKit. Since this is a blog about OpenClinica, we’ll also share early thoughts on how we see OpenClinica, ResearchKit, and OpenClinica Participate fitting together.

It’s Open Source. Great move! We’ll learn more about what this means when the code is released next month.

The technical paper indicates it is a front-end software framework for research, and that they expect it to expand over time as modules are contributed by researchers. Through use of both platforms’ APIs, OpenClinica could serve as a powerful backend and ‘brain’ to ResearchKit.

It’s not clear if data goes through Apple’s servers on its way to a final destination. I also haven’t seen anything from Apple mentioning if it will be portable to other non-iOS platforms (which represent 80% of mobile device market share), though its open source nature would suggest that will be possible.

Surveys. Analogous in many ways to the forms module in OpenClinica Participate, it is a pre-built user interface for question and answer surveys. As somebody who’s worked in this realm for years, I know that this can mean a lot of things. What specific features are supported, how flexible is it, how easy is the build process? Perhaps most important, can it be ported to other mobile app platforms, or to the web?

Informed Consent. The need for fundamental ethical controls for for research conduct and data use are just as important in the virtual world as they are in the brick-and-mortar realm, and Informed Consent is a cornerstone. I’m glad to see ResearchKit taking this on; I don’t expect they have it 100% figured out, but their work with Sage Bionetworks, who has released an open toolkit on Participant Centered Consent, is a great sign.

Active Tasks. Maybe the most exciting component, here’s where ResearchKit takes advantage of the powerful sensors and hardware in the device and provides a way to build interactive tests and activities. In this way, I expect ResearchKit will be a great complementary/alternative frontend to OpenClinica Participate when specialized tests tied to specific, highly-calibrated devices are required.

In general, the promise is big: that technology will lower barriers in a way that leads to fundamental advances in our understanding of human health and breakthrough treatments. That we’ll go from data collected once every three months to once every second, and we’ll encounter–and solve–problems of selection bias, identity management, privacy, and more along the way. And that, according to John Wilbanks at Sage, “there’s coming a day when you’re not going to have an excuse to have a tiny cohort just because you chose not the use digital technologies to engage people.”

The Future of Open Source

It was a privilege for OpenClinica to help with the “Future of Open Source” survey recently completed by Michael Skok of North Bridge Ventures, Black Duck and Forrester. The survey polled users and other stakeholders across the entire spectrum of OSS.

Recently published results from the survey substantiate the idea that open source is ‘eating the software world’s lunch’ (to borrow a phrase from Michael). OSS powers innovation, increases security, and enables a virtuous cycle of proliferation and participation across major sectors of our economy. This is even true in healthcare and life sciences, and we are seeing these trends within OpenClinica community. People are adopting OpenClinica and other open source research technologies because of the quality, flexibility, and security they provide, not just to save a buck or two.

What I find particularly significant in the results is the increased recognition of quality as a key driver of adopting open source. 8 out of 10 survey respondents indicate quality as a factor for increased OSS adoption. This has vaulted from the #5 factor in the 2011 survey to #1. In research, quality and integrity of data are paramount. OpenClinica’s active (and vocal!) community’s constant scrutiny of the code, and continuous improvements demonstrate the power of the open source model in producing quality software. Furthermore, working in a regulated environment means you need to do more than just have quality technology. You also must provide documented evidence of its quality and know how to implement it reliably. The transparent development practices of open source are huge contributors to achieving the quality and reliability that clinical trials platforms require. Knowing that feature requests and bug reports are all publicly reported, tracked, and commentable means nothing can hide under the rug. A public source code respository provides a history of all changes to a piece of code. And of course, it greatly helps that many of the key tools and infrastructure that power open source projects are open source themselves.

That’s just one set of factors driving us to a more open, participatory future:

“As a result of all this, Open Source is enjoying a grassroots-led proliferation that starts with a growing number of new developers and extends through the vendors and enterprises to the applications and services, industries and verticals, reaching more people and things in our everyday lives than ever before… there are now over 1 million open source projects with over 100 billion lines of code and 10 million people contributing.”

One thing I predict we’ll see a lot more of in the next year, especially for OpenClinica and life sciences as a whole, is greater interaction between projects and communities. OSS reduces traditional barriers and lets more people ‘get their hands dirty’ with tools and technologies. As OSS tools, libraries, and apps proliferate, innovation will increasingly come from the mashups of these projects.

Follow the survey findings and updates @FutureofOSS and #FutureOSS  

Using Patient-centered Technology to Improve Recruitment and Retention

Sponsors of clinical research must increasingly focus on improving patient engagement in order to meet many of today’s research challenges. Promising disruptions are already under way that could define new models for patient recruitment and retention.

In a time when drug development success is becoming scarcer and more expensive, the industry is looking everywhere it can for new, innovative approaches to improving health. Meeting recruitment goals is one of the biggest challenges for traditional clinical research. Less than one-third of people who come in for a screening end up completing a clinical trial.1 Thinking in a more patient-centric manner can help is in recruiting patients. A fundamental idea behind patient-centered research is to “amplify the patient’s role in the research process.”2 Employing new ways to engage patients and physicians while increasing their level of knowledge and trust can improve the sponsor’s ability to meet recruitment goals.

One often overlooked factor for study participation and retention is convenience. Raising the level of convenience for both the investigator and participant can eliminate a huge obstacle to non-participation or non-completion. There are many ways to incorporate increased levels of patient and physician convenience into trial design and execution, particularly using Internet-based technologies. For instance, social media can be an effective recruiting tool and an important way to build trust with targeted populations. Disease-specific online communities are becoming more and more prominent for chronic diseases. Matchmaking tools act as mediators that draw together researchers and participants. “Traditional” social media offers a less targeted, but no less effective, way to engage patients and investigators.

In general, the four key determinants of a person’s likelihood to participate in a trial are prior participation in research, existing relationships with researchers, involvement of trusted leaders, and trust in the organization. Keys to recruiting success in social media should keep these determinants in mind, and engage communities in a thoughtful, ethical way while respecting the norms of the community you are targeting.

Participant retention post-recruitment can be improved by strengthening the connections between participants and researchers, and enhancing communication structures to support these relationships.3 Capturing Patient Reported Outcomes electronically (ePRO), through the web or mobile devices, offers a way to interact with the participant in a meaningful way while also capturing critical data. For instance, offering the ePRO user risk scores and health recommendations based on their data, or using gamification techniques to increase protocol adherence, can enhance the traditional ePRO experience by offering direct, immediate value to the user. Enabling a “Bring Your Own Device” (BYOD) strategy can increase convenience for populations who already own their own smartphones or tablets. Of course, the study design and applicable regulatory considerations should drive when and how these techniques are used.

Increased focus on the patient experience is not a phenomenon unique to research, but something that is rapidly permeating healthcare systems. These rapid changes can enhance research engagement. There is enormous potential to capture far more robust data and have better follow up than ever before as widespread infrastructure is put in place for coordinated team-based care, home-based continuous monitoring, and wireless data reporting from medical devices. The (still elusive) promise of using the Electronic Health Record system in research to identify participants and capture clean, accurate trial data is more critical than ever before. As medical practices become more electronic and less paper-driven, investigators and staff should be engaged by providing them trial-specific information at the points in their workflow when they can best make use of it. Conversely, requiring them to go outside the workflows and systems they use in routine practice creates complexity and hassle that can deter research participation. A new level of integration between research and health data systems, based on standards (which exist) and open interfaces (which are coming, as part of Meaningful Use), will be necessary to make good on this potential.

As difficult research questions drive increased complexity in trial designs, many feel that the answer is to use technology in simple, scalable ways to engage more participants in research and capture more data. Dr. Russ Altman, a physician and Stanford professor recently told the New York Times, “There’s a growing sense in the field of informatics that we’ll take lots of data in exchange for perfectly controlled data. You can deal with the noise if the signal is strong enough.”4

References

1. Getz, Ken, The Gift of Participation: A Guide to Making Informed Decisions About Volunteering for a Clinical Trial, 2007, p40.

2. Pignone, Michael, MD, MPH, Challenges to Implementing Patient-Centered Research, Ann Intern Med. 18 September 2012;157(6):450-451

3. Nicholas Anderson, Caleb Bragg, Andrea Hartzler, Kelly Edwards, Participant-centric initiatives: Tools to facilitate engagement in research, Applied & Translational Genomics, Volume 1, 1 December 2012, Pages 25-29, ISSN 2212-0661, 10.1016/j.atg.2012.07.001.

4. http://www.nytimes.com/2013/01/15/health/mining-electronic-records-for-revealing-health-data.html?ref=health?src=dayp&_r=3&

Does EDC Help or Hurt Site Relations?

Getting reluctant clinical research sites to embrace technology such as electronic data capture (EDC) software can be difficult. This is a recipe for troubled relationships between the sponsor/CRO and sites. However, just as it is possible for a poor EDC implementation to erode sponsor-site relations, it is also possible for the EDC software to help cultivate improved relationships. Take a look at the new whitepaper, “Improving Site Relationships through EDC” to learn about some important considerations when thinking about site relations in the context of EDC.

The Evolution of Electronic Data Capture

OpenClinica was recently featured in an article in Genetic Engineering and Biotechnology News titled “Commandeering Data with EDC Systems,” written by Dr. James Netterwald. The article briefly recounts the early days of clinical trial Electronic Data Capture (EDC). But how far have we come? Dr. Netterwald’s title (perhaps unintentionally) conjures up images of struggle and strife, which may be perhaps more a more apropos description of the journey of Electronic Data Capture than it may first appear.

As an industry, it’s taken us a good 20 years to get to where we are, and to be plain, it’s been a slow start. (In my own defense, I, and my company Akaza Research, have only been a party to the industry for the last 5 of those 20 years.) Climbing the evolutionary ladder from shipping laptops to sites to keying data into electronic case report forms is certainly progress by any measure. However, while the days of mailing tapes and disks are over, the days of real electronic data capture are yet to come. Today, most experts agree that somewhere between only one-half and two-thirds of all new clinical trials use EDC software, an of this only a very small fraction are “e-source,” defined as collecting data in electronic form at its source as opposed to keying it in from some other source. In some ways it is ironic that cutting-edge biopharmaceutical technologies are developed themselves with technologies that are, relatively speaking, much further down the technology food chain.

Notwithstanding, there are some enterprising few who have pushed the pace towards true EDC. Spaulding Clinical, a large phase 1 unit in West Bend, Wisconsin has developed a system that automatically captures ECG data from their facility’s patients and directly populates the clinical trial database with these data. A patient wears the ECG device and the data are transmitted wirelessly to the EDC system. However, this slick and highly productive solution was not developed by either the ECG vendor or the EDC vendor. It was developed by hand by one of Spaulding’s own software developers.

Why isn’t this type of solution more commonplace in clinical trials? What prevents the industry from making the most of today’s information technology? With the strong incentives currently in place to make research more efficient, our field could certainly benefit from some more forward thinking.

– Ben Baumann