Life Sciences

Transfer and Store Genomics Data on AWS

Issue link: https://read.uberflip.com/i/1331308

Contents of this Issue

Navigation

Page 2 of 2

For more information on how AWS can help your organization with Genomics visit us at: aws.amazon.com/health/genomics © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics Data Sources A technician loads a sample on a sequencer. The sample is sequenced and written to a landing folder on local on-premises storage. An AWS DataSync sync task is set up to sync the data from the local hot folder to a bucket in Amazon Simple Storage Service (Amazon S3). Because genomics data is persisted in files by sequencers, while genomics analysis tools take files as inputs and write files as outputs, Amazon S3 is a natural fit for genomics data, data lake analytics, and managing the data storage lifecycle on AWS. Phenotypic Data Sources Research scientists and clinical researchers can upload annotation and clinical data as zip files to Amazon S3 via AWS Transfer for SFTP Data Transfer AWS DataSync is used to transfer raw genomics data from on-premises sequencers. AWS Transfer for SFTP can be used by research scientists to transfer clinical or annotation data to Amazon S3 buckets. AWS DataSync makes it easier and more cost effective to move large amounts of data online between on-premises storage and AWS storage services like Amazon S3. AWS DataSync handles common tasks including scripting copy jobs, scheduling and monitoring transfers, validating data, and optimizing network utilization. Storage and Archival Optimize storage by writing instrument run data to an Amazon S3 bucket configured for infrequent access. Identify your Amazon S3 storage access patterns to optimally configure your bucket lifecycle policy. Use Amazon S3 analytics storage class analysis to analyze your storage access patterns and update your lifecycle policies appropriately. For your analysis, use an observation period of at least 30 days. Amazon Glacier is a secure, durable, and extremely low-cost storage service for data archiving. Use Amazon Glacier for multiple tiers of data retrieval based on your specific needs, ranging from a few minutes to several hours. File-Based Data Access to Amazon S3 Researchers on-premises use existing bioinformatics tools with data in Amazon S3 via NFS or SMB using AWS Storage Gateway for Files. AWS Storage Gateway enables on-premises access to virtually unlimited cloud storage, helping simplify storage management. Many research organizations use third-party tools, open-source tools, or their own tools to work with their research data. These tools usually require file system-based access to data. AWS Storage Gateway offers SMB or NFS based access to data in Amazon S3, with local caching to optimize for data access cost and performance. Storage and Archival Researchers can cloud burst from on-premises, or use data already in Amazon S3, and use Amazon FSx for Lustre as a super-fast processing tier to maximize performance across all compute clusters. Amazon FsX for Lustre provides high- performance storage that can handle compute-intensive workloads, which helps speed time to insights in genomics analyses. This service delivers sub-millisecond latencies, up to hundreds of gigabytes per second of throughput, and millions of IOPS, and is available as a fully managed service. 1 2 3 5 6 4

Articles in this issue

Links on this page

view archives of Life Sciences - Transfer and Store Genomics Data on AWS