Migrate Data from AWS S3 to Azure Blob

Amar Daxini
3 min readNov 15, 2021

It’s the second part of a series of our cloud migration journey

Asset migration includes raw and transcoded media and images having more than 700 TB of data and more than 300 million files from AWS s3 to Azure Blob. We not only wanted to migrate this data but also restructure the entire directory hierarchy in a way that aligns with our new transcoding pipeline.

In this article, we will discuss a few approaches which we evaluated along with final approach which was selected.

1. Physical transfer using Amazon’s Snowball Edge service and Microsoft’s Data Box service

We were planning to use Amazon’s Snowball Edge service and Microsoft’s Data Box service together to effect the data transfer.

Amazon Snowball is like a hard disk where you can copy data from s3 or to s3.

AWS Snowball is a service that provides secure, rugged devices, so you can bring AWS computing and storage capabilities to your edge environments, and transfer data into and out of AWS. These rugged devices are commonly referred to as AWS Snowball or AWS Snowball Edge devices.

Data Box is also similar to aws snowball where you can copy data from blob or to blob.

Data Box devices easily move data to Azure when busy networks aren’t an option. Move large amounts of data to Azure when you’re limited by time, network availability, or costs

Here is the rough process

  1. Ask Amazon to put the files into a Snowball Edge device, and to ship the device to our office
  2. Ask Microsoft to ship a Data Box storage device.
  3. Transfer the files from the Snowball Edge device to the Data Box Device
  4. Send the Data Box Device to Microsoft and have them copy the data to Blob

The above approach is like copying a file from one hard disk to another and it looks easy to manage and it will save a cost but it also has some challenges like a manual copy of a file, a snowball disk size is limited so we might need multiple no of disk or multiple times we need to ship this thing back and forth so it adds some time so because mainly logistic and some manual process we have skipped this approach.

2. Script-based approach

We can write custom script which can copy data from aws to azure,Script can be written using different command line tools or any programming languages like aws cli, azcopy, poweshell etc.It has better control compared to physical copy , since we can do selective copy of data and also we can easily restructrue directory .

Some of the limitation of custom script are many of the basic features like monitoring,auditing, security, validation ,deployment needs to be develop from scratch.

So we wanted a ready to use tool which consists of built in features, where in we don’t want to reinvent the wheel.

3. Azure Data Factory

We finally choose ADF which has both connectors in which it copies data from AWS S3 to Azure Blob Container.

Azure Data Factory (ADF) is a fully managed, serverless data integration solution for ingesting, preparing and transforming all your data at scale.

ADF is easy to manage and easy to track and we can monitor data by adding an extra layer of validation and we have easily transformed data into our new directory structure.

Some of the key learning of our migration are

  • We can selectively copy data(S3 Bucket) based on the importance which saves our time instead of waiting for entire migration to start our dev work.
  • Delta migration: We can perform delta migration based on modified file time stamp.
  • Monitoring and Alerting can be configured for failed activity so we can rerun activity from failed states.

We initially face a few challenges related to scaling and other issues but thanks to MS Team which helped us quickly.

Compared to physical approach it is costly but it’s easier to manage and maintainable and it saves our developer and devops time a lot.

--

--

Amar Daxini

15+ years of experience who enjoys building large scalable products & platforms. Passionate about startups, working with new and emerging technologies.