Genomics Data Transfer, Analytics, and Machine
Learning using AWS Services AWS Whitepaper
10.An additional step is added to run an AWS Glue workflow to convert the VCF to Apache Parquet, write
the Parquet files to a data lake bucket in Amazon S3 and update the AWS Glue Data Catalog.
11.A bioinformatic scientist works with the data in the Amazon S3 data lake using Amazon Athena via a
Jupyter notebook, Amazon Athena console, AWS CLI, or an API. Jupyter notebooks can be launched
from either Amazon SageMaker or AWS Glue. You can also use Amazon SageMaker to train machine
learning models or do inference using data in your data lake.
16