TDWI Checklist Report: Cloud Data Warehousing

Issue link: https://read.uberflip.com/i/1354048

Contents of this Issue


Page 7 of 11

tdwi.org 7 TDWI RESE ARCH T D W I C H E C K L I S T R E P O R T: C LO U D DATA WA R E H O U S I N G S T R A I G H T TA L K 5 LEVERAGING A DATA LAKE In essence, a data lake is distinguished from the existing data warehouse by providing a more flexible platform for data availability and accessibility. In contrast with a data warehouse, a data lake imposes fewer limitations on what data elements are available for use, allows for data storage at scale, and can accommodate structured, semistructured, and unstructured data. To address the risk of data lake failure, organizations are instituting processes for data curation and governance to assess data lake assets, document their structural and object metadata in a data catalog, and help data consumers find and use the optimal data assets for their specific needs. OK, you are convinced that it makes sense to migrate your data warehouse to the cloud. Yet when you review the technology media, you find considerable buzz about data lakes. What is a data lake? Can it replace the data warehouse? Why is so much written about failed data lake implementations? The growing number of reporting/analytics consumers can be differentiated into distinct consumer communities, including traditional data analysts, citizen data analysts (those looking for transparent capabilities for producing simple reports), informed business analysts (who are a little more savvy), as well as more sophisticated data scientists desiring access to massive data volumes in their original forms. For some of these analysts, the rigid structure of the processes that ingest, process, and transform data for loading into a data warehouse often "wash out" information with potential for analytics insights. Alternatively, data scientists may want greater control over their own data pipelines for data preparation. This is where the data lake comes in—it provides a curated repository of source data sets in their original formats and makes those data sets available to a variety of consumers. TDWI defines a data lake as an unstructured data repository that contains information available for analysis. A data lake ingests data in its raw, original state, straight from data sources, without any cleansing, stan - dardization, remodeling, or transformation.

Articles in this issue

Links on this page

view archives of Ebooks - TDWI Checklist Report: Cloud Data Warehousing