Learn to Avoid Life Sciences Data Access Pitfalls

July 30, 2019

Data drives the life sciences. Data supports the development of new products and enables agile decision making. But for a field so completely reliant on data, the industry is struggling to find methods to adequately handle that data. Ideally, there would be centralized repositories where data is accessible, safe, and organized regardless of the format or size. Instead, there are numerous data silos spread among the different contributors to a specific project.

The downstream effects are that managers lack access to data that would enable strategic decision making. Or researchers spend time re-doing elaborate experiments because they have a graph, but not the original data that was the basis for the graph. Researchers have better things to do than rebuild lost data from an excel spreadsheet. Avoid these issues by addressing data access challenges.

Data Access Challenges

‍Data produced by a single drug program can be accessed, viewed, and analyzed by many contributors spanning years. The issue is not a single research scientist carefully compiling their work, but its multiple streams of collaborators like contractors, CROs, clinical centers, regulatory functions, and R&D scientists, who all need secure access to the data. When trying to integrate data from multiple people, functions, and companies into a single, comprehensive source for use in product development, manufacturing SOPs, regulatory applications, marketing campaigns, proof of concept, etc. this can become a fundamental struggle. Below are some of the unique challenges faced when combining data from multiple sources to support a single project.

It is common practice to hire contractors to handle increased workloads, pursue interesting side projects, or repeat experiments to confirm previous findings. It’s also common practice for a new employee can get handed the project months later, but with much of the data/metadata/context the contractor generated missing. This is a reality and is indicative of a broader problem. On average it takes a decade to bring a new drug from discovery to market. Yet according to the Bureau of Labor Statistics, the median time with a single employer is 4.2 years. This fundamental mismatch combined with the number of employees supporting a single project, ensures a lack of continuity.

The original creator of the data is one of many people who needs access to that data in both raw and analyzed formats. So the ability to generate a high volume of critical data is meaningless, if it’s not both accessible and contextualized. There needs to be a company-wide system for organizing archived data that ensures continuity and data fidelity.

Sharing Data Securely

‍Whether it’s clinical collaborators or a CDMO hired for a specific task, there is a need to both disseminate and centralize data with the appropriate firewalls in place. These external contributors certainly need access to some data - such as protocols and updates on progress, but not unfettered access to the whole pipeline. Similarly, as the employer, you need access to most of their data but might want to impose restrictions to comply with data regulations, such as HIPAA. Sending hundreds of files back and forth in email apps or rogue dropboxes is not the answer. Ideally, there would be a centralized source for data that enables selective data sharing with partners without sacrificing data security.

Storing Metadata Uniformly‍

‍Metadata contextualizes data and gives it meaning. In other words, an impeccably archived data set can become useless with missing metadata. This critical information centers around how the data was collected, what methods were used, and what materials were involved; this data enables comparison and identification of trends.

The challenge around capturing metadata is that, from a research perspective, this data could be jotted down in a lab notebook only. “Experiment performed at 32°C or treatment was applied for 72 hours followed by 3 washes with PBS for 10 min each while shaking”. Similarly, in the case of an automated manufacturing protocol, there could be an hourly readout of temperatures, masses, or off-line tests (e.g. NMR spectra). There is no consistent data structure and getting that data from a lab notebook or equipment readout to a robust, consistent, shareable format is no small challenge.

‍Remember This...

‍The solution is a paradigm shift, where data integrity in terms of quality, associated metadata, and accessibility to the larger organization is integral to its importance.

The tools are data sharing systems where:

Shared digital lab notebooks can replace paper notebooks and enable access and collaboration in real-time, and
File sharing systems and platforms can be used as the first place to store data; to ensure a secure, up to date, centralized location for data that is impervious to retirement, death of a laptop, or a flooded server room.

Take a look at how Egnyte is helping its life sciences customers address these issues.