Life Sciences

Whitepaper: Genomics Data Transfer, Analytics, and Machine Learning using AWS Services

Issue link: https://read.uberflip.com/i/1358110

Contents of this Issue

Navigation

Page 8 of 33

Genomics Data Transfer, Analytics, and Machine Learning using AWS Services AWS Whitepaper Recommendations Running secondary analysis workflows using AWS Step Functions and AWS Batch Running secondary analysis workflows to perform sequence alignment, variant calling, quality control (QC), annotation, and custom processing can be done in AWS using native AWS services. We'll provide recommendations and a reference architecture for running secondary analysis workflows in AWS using AWS Step Functions and AWS Batch. Recommendations When running secondary analysis workloads in the AWS Cloud, consider the following recommendations to optimally run secondary analysis. Use AWS Batch to run tasks in your genomics workflows—Most secondary analysis tasks are perfectly parallel, meaning that those tasks can be run independently and often in parallel. Provisioning resources on-demand for tasks in AWS Batch is more cost-effective and optimizes performance better than using a traditional High-Performance Computing (HPC) environment. Use AWS Step Functions to orchestrate tasks in your secondary analysis workflows— Keep task execution separate from task orchestration so that purpose-built solutions are used for each activity and tools and workflows can be deployed independently. This approach limits the impact of change which minimizes risk. AWS Step Functions is serverless which minimizes operational burden. Package tools in Docker containers and use standard Amazon Machine Images (AMIs)—Package tools in Docker containers so they are portable and you can take advantage of serverless container solutions to orchestrate your container execution. Using standard AMIs removes the operational burden of maintaining machine images. Package tools independent of workflows in their own Docker container—Package tools independently to right-size the compute for each tool, which can optimize performance and minimize cost when running each tool container. Multiple workflows can also use the same tool containers. Treat configuration as code for secondary analysis tools and workflows—Fully automate the build and deployment of secondary analysis tools and workflows. Automation empowers your teams to move quickly while maintaining control, providing a repeatable development process for advancing your genomics solution into production. Use Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances to optimize for cost—Amazon EC2 Spot Instances offers significant savings when you run jobs. You can fall back to on-demand instances if spot instances are not available. 6

Articles in this issue

view archives of Life Sciences - Whitepaper: Genomics Data Transfer, Analytics, and Machine Learning using AWS Services