SHARE:
Data lakes: How to glean new
insights from existing data
2
Mo der n biophar ma comp anies are
data businesses. Companies answer
every question from drug discovery to
market access by generating and ana-
lyzing rich data. This creates a wealth
of data that is used once then siloed,
limiting its value. There is another
way, though. Managed properly, old
data yields new insights.
Today's data strategies recognize the value
of old data and the interconnectivity of
teams within companies. As activities in
one functional area have implications
for the broader business, organizations
are eliminating silos that keep datasets
generated by different teams apart. This
allows companies to correlate data from
each functional area across the lifecycle
of the drug, from discovery through to
commercialization.
The value of having ready access to
historic data in its raw form is clear, too.
No researcher knows all the questions to ask a dataset
on the day it is generated. Science and the companies
that perform it move forward, continually revealing
new questions. As this happens, researchers who
refer back to historic data and, better still, combine
it with data from other sources are best placed to
answer the questions they face, be they related to
R&D, manufacturing or commercialization.
If new evidence links a gene to a phenotype, there
is value in having access to historic sequencing data
and accompanying medical records to build on the
breakthrough. If vaccine yield at a manufacturing
facility drops, there is value in having access to
years of data on every aspect of the operation to
look for patterns that explain the trend. When
commercializing a drug, there is value in creating
a real-world evidence ecosystem to
demonstrate the value of the product
to payers, physicians and patients. And
there is value, even necessity under
Quality by Design approaches, in using
clinical production data to inform
manufacturing scale-up.
T he re i s a l s o v a lu e i n prov i d i ng
intel le c tu a l ly-c ur ious p e ople w it h
sandb oxes of data. S cience moves
forward on a succession of "what if?"
questions. Companies with centralized
rep ositories of all their data have
u np a r a l l e l e d c ap a c i t y t o a n s w e r
such questions. They can also reveal
"unknown unknowns", insights they
never knew to look for but were able
to uncover by exploring their data.
Data scientists working in fields as
diverse as process development and
observational research want to perform
such analyses. The problem is today's
data analytics pipelines are better equipped to answer
predefined questions.
POOLING DATA TO DRIVE DISCOVERIES
Today, enterprise data warehouses form the backbone
of analytics pipelines. Before entering the warehouse,
data undergoes a process known as extract, transform
and load (ETL). The goal is to pull data from source
systems, transform it and load it into the warehouse.
The data is then ready to use in reporting and business
intelligence. This gives companies a single source
of validated data everyone uses to fuel analyses.
Companies need this single source of truth but many
are also realizing it cannot meet the needs of all their
users. Transforming data prior to storage limits its
use to business intelligence. If users want to perform
Researchers
who refer back
to historic data
and combine it
with data from
other sources
are best placed
to answer the
questions they
face