TDWI Checklist Report: Cloud Data Warehousing

Issue link: https://read.uberflip.com/i/1354048

Contents of this Issue


Page 8 of 11

tdwi.org 8 TDWI RESE ARCH T D W I C H E C K L I S T R E P O R T: C LO U D DATA WA R E H O U S I N G S T R A I G H T TA L K 6 WHAT IS A DATA LAKEHOUSE? • Virtual layering of structured schema over data managed in object storage • Integrated governance over population, management, and access to data in object storage • Support for query access to structured data in the data warehouse as well as both structured and semistructured data in the data lake using schema-on-read • Capabilities for cross-platform integration (e.g., queries joining database tables with semistructured data sets) • Practically unlimited reusability, in that you can load the data once into the data lakehouse and use that data resource as part of any number of advanced analytics workflows as well as attaching or loading the data directly into the cloud data warehouse's more structured reporting and analysis workflows Essentially, the data lakehouse approach smooths integration and access across the data landscape and allows for even greater flexibility for the different consumer communities. We have differentiated between a data warehouse and a data lake. Both have benefits and drawbacks that would inspire a data consumer to choose one paradigm over the other. Data in the data warehouse is cleansed and well-organized to simplify reporting and analysis, but it is limited because the included data sets are subject to filtering and transformation. Data lakes allow for a much broader array of data options, but when the data sets remain in their original raw state, there are bound to be inconsis - tencies that potentially impact trust in the analytics insights derived from them. In a modern analytics environment, neither approach alone is likely to be satisfactory to meet every analyst's needs. However, you do not have to choose one approach over the other. These two paradigms are not mutually exclusive, and your organization can benefit from the synergy that emerges from blending the approaches. This blended paradigm, known as a data lakehouse, looks at how the data warehouse and the data lake can complement each other and deliver the best of both worlds. A data lakehouse is characterized as a reporting/ analytics/BI environment that provides a semantic layer harmonizing accessibility to the structured, semistructured, and unstructured data assets managed in the combined data landscape afforded by the warehouse and the data lake. A data lakehouse provides: • Standardized data storage formats in object storage (such as using the columnar alignment provided by Apache Parquet) • Separation of storage from compute, freeing data consumers from the limitations imposed by monolithic databases

Articles in this issue

Links on this page

view archives of Ebooks - TDWI Checklist Report: Cloud Data Warehousing