Life Sciences

Whitepaper: Genomics Data Transfer, Analytics, and Machine Learning using AWS Services

Issue link: https://read.uberflip.com/i/1358110

Contents of this Issue

Navigation

Page 26 of 33

Genomics Data Transfer, Analytics, and Machine Learning using AWS Services AWS Whitepaper Appendix H: Handling secondary analysis task errors and workflow failures If your Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances are interrupted, jobs running on that instance will fail. Amazon EC2 terminates, stops, or hibernates your Spot Instance when the price exceeds the maximum price for your request or capacity is no longer available. If retries are enabled for the given task in the AWS Step Functions state machine and there are more retries left, the job will be resubmitted to AWS Batch. If a Spot Instance is available, it will be used to run the task. If there are no Spot Instances available and you have an On-Demand compute environment next in the compute environments list for the queue, AWS Batch will provision an On-Demand Instance to run the task. Verify that you have configured failover from a Spot Instance compute environment to an On-Demand Instance compute environment for a given job queue. Also verify you have configured retries on tasks in your Step Functions state machine. You can also specify a try/catch block to handle complex retry scenarios or to execute custom code such as submitting a ticket to an issue tracking system with custom code in the catch block. For more information about how AWS Step Functions handle errors, see Error Handling in Step Functions in the AWS Step Functions Developer Guide. Many organizations build in the ability to resume a secondary analysis state machine execution when it was interrupted due to a job failure. The idea is to skip jobs that have already produced their output files, resuming the state machine execution at the job that was next when resources were interrupted. This is commonly referred to as checkpointing. One approach is to add a gate clause to the beginning of your tool wrapper code to check for the job outputs and return if they already exist. If the output already exists, the tool immediately returns, allowing failed state machine executions to fast-forward to the point of failure. 24

Articles in this issue

Links on this page

view archives of Life Sciences - Whitepaper: Genomics Data Transfer, Analytics, and Machine Learning using AWS Services