Life Sciences

Whitepaper: Genomics Data Transfer, Analytics, and Machine Learning using AWS Services

Issue link: https://read.uberflip.com/i/1358110

Contents of this Issue

Navigation

Page 2 of 33

Genomics Data Transfer, Analytics, and Machine Learning using AWS Services AWS Whitepaper Table of Contents Abstract ............................................................................................................................................ 1 Abstract .................................................................................................................................... 1 Introduction ...................................................................................................................................... 2 Transferring genomics data to the Cloud and establishing data access patterns using AWS DataSync and AWS Storage Gateway for files ............................................................................................................ 3 Recommendations ...................................................................................................................... 3 Reference architecture ................................................................................................................ 4 File-based access to Amazon S3 ........................................................................................... 5 Running secondary analysis workflows using AWS Step Functions and AWS Batch ...................................... 6 Recommendations ...................................................................................................................... 6 Reference architecture ................................................................................................................ 7 Performing tertiary analysis with data lakes using AWS Glue and Amazon Athena ...................................... 9 Recommendations ...................................................................................................................... 9 Reference architecture .............................................................................................................. 10 Performing tertiary analysis with machine learning using Amazon SageMaker .......................................... 12 Recommendations .................................................................................................................... 12 Reference architecture .............................................................................................................. 13 Conclusion ....................................................................................................................................... 14 Appendix A: Genomics report pipeline reference architecture ................................................................. 15 Appendix B: Research data lake ingestion pipeline reference architecture ................................................. 17 Appendix C: Genomics data transfer, analytics, and machine learning reference architecture ....................... 18 Appendix D: Compliance resources ..................................................................................................... 19 Appendix E: Optimizing data transfer, cost, and performance ................................................................. 21 Appendix F: Optimizing storage cost and data lifecycle management ...................................................... 22 Appendix G: Optimizing secondary analysis compute cost ...................................................................... 23 Appendix H: Handling secondary analysis task errors and workflow failures .............................................. 24 Appendix I: Monitoring secondary analysis workflow status, cost, and performance ................................... 25 Appendix J: Scaling secondary analysis ............................................................................................... 26 Appendix K: Optimizing the performance of data lake queries ................................................................ 27 Appendix L: Optimizing the cost of data lake queries ............................................................................ 28 Contributors .................................................................................................................................... 29 Document revisions .......................................................................................................................... 30 Notices ............................................................................................................................................ 31 iii

Articles in this issue

view archives of Life Sciences - Whitepaper: Genomics Data Transfer, Analytics, and Machine Learning using AWS Services