Life Sciences

Data Lakes: How to glean new insights from existing data

Issue link:

Contents of this Issue


Page 1 of 4

SHARE: Data lakes: How to glean new insights from existing data 2 Mo der n biophar ma comp anies are data businesses. Companies answer every question from drug discovery to market access by generating and ana- lyzing rich data. This creates a wealth of data that is used once then siloed, limiting its value. There is another way, though. Managed properly, old data yields new insights. Today's data strategies recognize the value of old data and the interconnectivity of teams within companies. As activities in one functional area have implications for the broader business, organizations are eliminating silos that keep datasets generated by different teams apart. This allows companies to correlate data from each functional area across the lifecycle of the drug, from discovery through to commercialization. The value of having ready access to historic data in its raw form is clear, too. No researcher knows all the questions to ask a dataset on the day it is generated. Science and the companies that perform it move forward, continually revealing new questions. As this happens, researchers who refer back to historic data and, better still, combine it with data from other sources are best placed to answer the questions they face, be they related to R&D, manufacturing or commercialization. If new evidence links a gene to a phenotype, there is value in having access to historic sequencing data and accompanying medical records to build on the breakthrough. If vaccine yield at a manufacturing facility drops, there is value in having access to years of data on every aspect of the operation to look for patterns that explain the trend. When commercializing a drug, there is value in creating a real-world evidence ecosystem to demonstrate the value of the product to payers, physicians and patients. And there is value, even necessity under Quality by Design approaches, in using clinical production data to inform manufacturing scale-up. T he re i s a l s o v a lu e i n prov i d i ng intel le c tu a l ly-c ur ious p e ople w it h sandb oxes of data. S cience moves forward on a succession of "what if?" questions. Companies with centralized rep ositories of all their data have u np a r a l l e l e d c ap a c i t y t o a n s w e r such questions. They can also reveal "unknown unknowns", insights they never knew to look for but were able to uncover by exploring their data. Data scientists working in fields as diverse as process development and observational research want to perform such analyses. The problem is today's data analytics pipelines are better equipped to answer predefined questions. POOLING DATA TO DRIVE DISCOVERIES Today, enterprise data warehouses form the backbone of analytics pipelines. Before entering the warehouse, data undergoes a process known as extract, transform and load (ETL). The goal is to pull data from source systems, transform it and load it into the warehouse. The data is then ready to use in reporting and business intelligence. This gives companies a single source of validated data everyone uses to fuel analyses. Companies need this single source of truth but many are also realizing it cannot meet the needs of all their users. Transforming data prior to storage limits its use to business intelligence. If users want to perform Researchers who refer back to historic data and combine it with data from other sources are best placed to answer the questions they face

Articles in this issue

Links on this page

view archives of Life Sciences - Data Lakes: How to glean new insights from existing data