Migrating a Large Volumes of Data

One of the big problems when performing a migration is the movement of a large volume of data around the network. This may have to occur during normal business operation if you are dealing with an organisation that runs 24×7 or you may have exclusive use of the network out of hours. Either way you could be moving a large amount of data across a network. Transferring 1GB of data will take about 15 minutes over a 10 Mbps network.

Remember also that you will need to execute the migration multiple times throughout the project. In the early stages this will just be with test data but you will need to be running it several time with production data during mock runs.

The first scenario is that you just wear the time cost to move the data off your production systems onto the migration platform. This may or may not be acceptable depending on what outage will the business will accept for the entire migration. Some alternatives you have here are to increase the bandwidth of the network. If you increase the bandwidth from 10 Mbps to 100 Mbps that same 1GB of data would only take 1 min 30 sec to transfer.

Another way of tackling this problem is to reduce the amount of data you have to transfer on the day of the migration by pre-migrating some of the data. Some data can be categorised as read-only so can be migrated days or weeks before the actual migration is to occur. The key here is to identify this data; it could be reference data, orders that have been delivered etc. Once you have identified the data that can be migrated early, the migration on the day will only need to migrate that data that hasn’t already been migrated. This may be significant or not depending on the nature of the data that needs to be migrated.

Change data capture (CDC) provides a pattern where a copy of the data can be taken usually from a back-up or other means and then kept in sync by replicating the changes to the copy. Firstly getting the copy of the data from the source system can be done in a non evasive manner by restoring an offline backup. Then generally the tool will use the log for the source system to capture the changes to replicate to the copy. This generally puts a negligible performance impact on the source system so should not impact to production running of the source system. This will come down to the way CDC has been implemented by the vendor as to any impact to the source systems.

There are a few alternatives to how to deal with the problem of moving a significant volume of data as part of a migration. As outline migration is much more than ETL and the mapping rules that tell you what data to migrate and the systems to get it from won’t be know until relatively late in the project. So from an architecture perspective you may decide on an approach to move the data but may also need a strategy on an alternative if you find the timings are too long.



One Response to “Migrating a Large Volumes of Data”

  1. Migration Summary « What is an Architect? Says:

    […] we soon realised that a migration is much more than ETL. We had to tackle several problems such as migrating a large volume of data. The importance but also the frustration of having to define what would happen on the day. By […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: