Over the last 9 months I have been working on a migration project which has been a new domain for me. Coming into the project I spent a fair bit of time trying to understand the domain to translate that into an architecture. Over time I came back to the simple answer that it is Extract – Transform – Load and I have ended up with that as the architecture. Where the ambiguity comes into it is knowing what systems you are extracting data from and the systems that you are loading the data into.
The roadblock that I hit was as an architect was that I wanted to identify the target and source systems that would form the solutions. To answer this question it became fairly clear that we would not know this answer until we had done all of the mapping from source to target. From a SDLC perspective this is in effect the detailed requirements/design which is way after the architecting bit.
Unfortunately we were in a situation where the target systems were being designed while we were designing the migration so were hitting dependencies again and again that we were trying to map to something where the design hadn’t been locked down. As a result of this, the mantra came about that as a project we were “special” and I really do think that we were but from a project management perspective there was an expectation that the normal waterfall SDLC would be followed and done in parallel to the design of the target systems. If I had my time again this is one issue that we should have driven a little harder to get a more logical alignment to the target systems design. The other solution could have been to use an iterative SDLC but this had its own issues being an organisation to great at doing iterative.
Architecting the migration capability, the main Non-Functional question that we needed to address in the design was how long it would take to perform the migration. There were several approaches to this but once again the problem of not really knowing the answer to the problem until we got closer to the finish came into play. We did try and look at other examples that we could use as a baseline which sounds OK in theory but in practice turned out to be a little more difficult. The technology is one part of a migration i.e. moving X GB of data from point A to point B and transform it on the way using a ETL tool. Then there are the operational components, things like reconcilliation and excpetion management, that add a variable amount of time to the migration. Talking to others who had done migrations the operation part is huge the first time you test the process and progressively become more efficient.
The answer around performance really boiled down to having the project in a position to do some sort of a trial run as early as possible and to performance benchmarks at this point. From a design perspective we needed to know what parts of the design could be tuned to deliver increased performance this included scaling the infrastructure and pre-migrating some data.
So a migration is just an ETL the complexities come about in that the details that drive the tuning of this design to allow identification of the source systems and also to meet the performance requirements. Move past the ETL problem there are more than enough tools to solve this problem focus on the bigger ticket items. But be prepared to call out dependencies on requirements that are needed to solve these other problems and look at ways to iteratively design the solution as these requirements come about, as waterfall just won’t do it.