Life Sciences

Whitepaper: Genomics Data Transfer, Analytics, and Machine Learning using AWS Services

Issue link: https://read.uberflip.com/i/1358110

Contents of this Issue

Navigation

Page 4 of 33

Genomics Data Transfer, Analytics, and Machine Learning using AWS Services AWS Whitepaper Introduction When running genomics workloads in the Amazon Web Services (AWS) Cloud, how does an organization manage cost, optimize workload performance, and move fast with control? How does an organization secure sensitive information? What resources are available to help meet a team's compliance needs? How does an organization perform analytics using machine learning? This paper answers these questions by showing how to build a next-generation sequencing (NGS) platform from instrument to interpretation using AWS services. We'll provide recommendations and reference architectures for developing the platform including: 1) transferring genomics data to the AWS Cloud and establishing data access patterns, 2) running secondary analysis workflows, 3) performing tertiary analysis with data lakes, and 4) performing tertiary analysis using machine learning. The genomics market is highly competitive so having a development lifecycle that allows you to move fast with control is critical. Solutions for three of the reference architectures in this paper are provided in AWS Solutions Implementations. These solutions leverage continuous delivery (CD), allowing you to develop the solution to fit your organizational need. Note To access an AWS Solutions Implementation providing an AWS CloudFormation template to automate the deployment of the secondary analysis solution in the AWS Cloud, see the Genomics Secondary Analysis Using AWS Step Functions and AWS Batch Implementation Guide. To access an AWS Solution Implementation providing an AWS CloudFormation template to automate the deployment of the tertiary analysis and data lakes solution in the AWS Cloud, see the Genomics Tertiary Analysis and Data Lake Using AWS Glue and Amazon Athena Implementation Guide. To access an AWS Solution Implementation providing an AWS CloudFormation template to automate the deployment of the tertiary analysis and machine learning solution in the AWS Cloud, see the Genomics Tertiary Analysis and Machine Learning using Amazon SageMaker. A summary of the services used in this platform is shown in Table 1. You can learn about the compliance resources available to you in Compliance resources (p. 19). Table 1 – AWS services for data transfer, secondary analysis, and tertiary analyses Data Transfer Secondary Analysis Tertiary Analysis Data Access Patterns AWS DataSync AWS Storage Gateway for files Secondary Analysis AWS Step Functions AWS Batch Data Lakes Amazon Athena AWS Glue Cost Optimization AWS DataSync Amazon S3 Monitor & Alert Amazon CloudWatch Machine Learning Amazon SageMaker DevOps AWS CodeCommit AWS CodeBuild AWS CodePipeline DevOps AWS CodeCommit AWS CodeBuild AWS CodePipeline 2

Articles in this issue

Links on this page

view archives of Life Sciences - Whitepaper: Genomics Data Transfer, Analytics, and Machine Learning using AWS Services