47Lining Data Lake Foundation
Quick Start Architecture - Components
Ingest
Ingestion process accepts batch submissions to S3 submissions bucket and streamed submissions to Amazon Kinesis Firehose.
Submissions are indexed in Elastic Search, triggered by AWS S3 eventing .
Data Set Management
Curated datasets are result of Redshift transformations and AWS Kinesis Analytics. Curated datasets reside in a dedicated S3 buc ket. They are also indexed in
ElasticSearch.
Transform, Aggregate, Analyze
Analyses transform, aggregate and process curated datasets. Ad - hoc analyses can be done in AWS Athena, AWS Redshift Spectrum mak es it easy to join
dimensional data with facts.
Search
Search is enabled by means of metadata indexing in ElasticSearch and exposed through Kibana dashboards. Objects are indexed in reaction to S3 Event published
on a SNS topic to which a indexing lambda function is subscribed.
Publish Data
Data are published in a published data bucket. Publishing process moves and transforms data from curated dataset to published data bucket for downstream
consumer like AWS QuickSight.
Visualize
Published data can be visualized with AWS QuickSight.