Genomics Data Transfer, Analytics, and Machine
Learning using AWS Services AWS Whitepaper
Reference architecture
Reference architecture
Figure 4: Tertiary analysis with machine learning using Amazon SageMaker reference architecture
An AWS Glue job is used to create a machine learning training set. Jupyter notebooks are used to
generate machine learning model generation pipelines, explore the datasets, generate predictions, and
interpret the results.
An AWS Glue job ingests data used for training a machine learning model, adds features used for
training, and writes the resulting dataset to an Amazon S3 bucket. A Jupyter notebook is run to generate
a machine learning pipeline using Amazon SageMaker Autopilot, generating two notebooks. These two
notebooks capture the data structures and statistics, feature engineering steps, algorithm selection,
hyperparameter tuning steps, and the deployment of the best performing model. Users have a choice
to either use the plan recommended by SageMaker Autopilot or modify the generated notebook to
influence the final results. A prediction notebook is used to generate predictions and evaluate model
performance using a test dataset.
Note
Note: To access an AWS Solutions Implementation providing an AWS CloudFormation template
to automate the deployment of the solution in the AWS Cloud, see the Genomics Tertiary
Analysis and Machine Learning Using Amazon SageMaker.
13