Life Sciences

Whitepaper: Genomics Data Transfer, Analytics, and Machine Learning using AWS Services

Issue link: https://read.uberflip.com/i/1358110

Contents of this Issue

Navigation

Page 17 of 33

Genomics Data Transfer, Analytics, and Machine Learning using AWS Services AWS Whitepaper Appendix A: Genomics report pipeline reference architecture The following shows an example end-to-end genomics report pipeline architecture using the reference architectures described in this paper. Figure 5: Genomics report pipeline reference architecture 1. A technician loads a genomic sample on a sequencer. 2. The genomic sample is sequenced and written to a landing folder that is stored in a local on-premises storage system. 3. An AWS DataSync sync task is preconfigured to sync the data from the parent directory of the landing folder on on-premises storage, to a bucket in Amazon S3. 4. A run completion tracker script running as a cron job, starts a DataSync task run to transfer the run data to an Amazon S3 bucket. An inclusion filter can be used when running a DataSync task run, to only include a given run folder. Exclusion filters can be used to exclude files from data transfer. In addition, consider Incorporating a zero-bite file as a flag when uploading the data. Technicians can then indicate when a run has passed a manual QA check by placing an empty file in the data folder. Then, the watcher application will only trigger a sync task if the success file is present. 5. DataSync transfers the data to Amazon S3. 6. An Amazon CloudWatch Events is raised that uses an Amazon CloudWatch rule to launch an AWS Step Functions state machine. 7. The state machine orchestrates secondary analysis and report generation tools which run in Docker containers using AWS Batch. 8. Amazon S3 is used to store intermediate files for the state machine execution jobs. 9. Optionally, the last tool in the state machine execution workflow uploads the report to the Laboratory Information Management System (LIMS). 15

Articles in this issue

view archives of Life Sciences - Whitepaper: Genomics Data Transfer, Analytics, and Machine Learning using AWS Services