Version: v2.5

Migration and Integration

Hopp is built for data migration. Integration comes up alongside it, and the two are easily conflated.

A migration is a one-time, long-running move of data from a source system to a target system. It runs until the data reaches a known quality level, and ends at cutover.
An integration is the continuous synchronization of data between systems that are both live. It has no end. As long as both systems run, the synchronization keeps running with them.

Both tasks read data from one system, transform it, and write it to another. From a distance they look like the same work.

They are not.

Migration and integration are different kinds of work, and the difference is worth stating clearly. This article sets out where the line falls, and how to tell which side of it a piece of work belongs on.

Migration is a project with an end

A migration moves data from a source system to a target system. It runs as a project, and it runs for a while. The work moves the data toward a known quality level, and then it finishes at cutover. A migration has an end, and everyone involved is working toward it.

This is long-running, sustained work. The data does not arrive clean and complete on the first pass. The team runs the migration, reads the results, corrects what is wrong, and runs it again. That cycle repeats until the data is good enough to go live.

Migration

A migration climbs toward a known quality level, and stops at cutover.

Errors in the source data are par for the course. A migration is in part the work of finding those errors and correcting them. Hopp is built around that reality. The Engine reports what it finds. The Portal presents the results. The team iterates until the data is good enough to go live.

A data problem is not an emergency.

It is the next item of work. That is one half of the clearest test for telling migration and integration apart.

Integration keeps live systems aligned

Integration keeps data synchronized between systems that are both live and in day-to-day use. A common case is feeding operational data into a data warehouse. Another is keeping different systems in step with each other when a business runs many of them.

Integration

An integration runs every day, with no end, as long as both systems are live.

Integration is unattended by design, and people step in only when something goes wrong.

When something does go wrong, it is serious. A failure in an integration is an incident. It can block a business process, and it usually calls for immediate attention and escalation.

This is the other half of the test. The same data problem that is routine in a migration is an incident in an integration.

Why the distinction matters for Hopp

Everything Hopp puts in front of people is built for migration work. The Portal, the iterate-and-correct loop, the team looking at results: all of this exists for one reason. A migration is the work of correcting data with people in the loop. An integration has none of that. Nobody sits and watches the Portal in an integration scenario. They watch for alarms.

Hopp is built for data migration, and it excels at it. An ETL tool can be pushed to run a migration: it moves and transforms data, which is part of the work. But you would build the rest of the migration yourself, and it would not match what Hopp brings. For pure integration, ETL and integration tools are the right choice. For migration, Hopp is.

Staged Migrations

Between those two cases sits a situation that causes a lot of confusion.

Some migrations are staged. The source system and the target system run side by side for a period before the source is retired. A project that plans a staged go-live often assumes the target will need a steady feed of data from the source until cutover.

Why there are not many

Few migrations are staged, and there are good reasons for that.

In most cases the data can be partitioned cleanly between the two systems. Each system owns a distinct set of records, so neither needs updates from the other. A customer account that has moved to the new system is worked on there. One that has not stays on the old system.

Sometimes a small change in the old system supports this split. A "migrated" marker, often set from the output of a Hopp migration run, makes a record read-only in the old system and sends the user to the new one to work with it. That modest effort can remove the need for any interim synchronization.

In other cases the staged approach itself does not survive contact with reality. External dependencies make it too complicated, too risky, or simply impossible. The project then settles on a single cutover instead.

The point is that real interim synchronization is rare. Most staged migrations end up designing it away, and most migrations are not staged in the first place.

The ones that remain

Some migrations genuinely are staged, and they have a real interim period. During that interim period the source system is still in daily use, so data keeps being created and changed there. The target must be kept current with those changes until cutover.

Where it genuinely is needed, one-directional interim synchronization is still part of the migration:

It ends at cutover, the same way the migration does.
It uses the same Map. Updating the target during the interim period uses the Source and Target Map from the migration itself, run again and incrementally rather than once.
Its data problems are still migration data problems. They are handled by the same team, through the same iterate-and-correct process.

Interim period

Through the interim period the migration Map is run again and again, feeding the target one way, until the source system is retired.

Hopp supports this one-directional interim synchronization. Keeping the new system fed from the old one until cutover is part of delivering the migration.

Two-way synchronization

Sometimes a project wants duplex interim synchronization, running both ways. The source updates the target, and the target also updates the source.

Hopp can handle the transformation side of a reverse, target-to-source feed. A Map runs in one direction, but the mapping language and the Engine do not care which direction that is, so a reverse Map can be built.

The reverse Map is not the forward Map flipped around. A migration Map cleans and normalizes data as it runs, so it cannot simply be run in reverse. It has to be written from scratch, as a separate Map that starts from the target system and maps back to the source.

The harder half is not transformation. Two-way synchronization also has to:

decide which side wins when the same record changed on both sides between runs,
prevent a change from echoing back and forth between the systems,
reconcile records that were created or deleted on one side but not the other.

None of that is expressed by a Map. It is a separate body of work, and it is genuinely integration. When a project needs true two-way synchronization, a dedicated integration tool is the right fit for that part.

The boundary is the same as before. Hopp delivers the migration, including a one-directional interim period. Continuous two-way synchronization between live systems is integration, and an integration tool is the right choice for it.

Telling the two apart

Some pieces of work are genuinely hard to pin down. A feed that runs for months, a go-live that keeps slipping, a target that needs fresh data well before cutover: any of these can look like either kind of work. The label attached to the work is not always reliable.

A few questions sort it out:

Does the work have a cutover date, or does it run with no end in sight?
When the data is wrong, is fixing it the next task, or is it an emergency?
Will someone look at the results, or will people step in only when an alarm goes off?
Is the data being moved once, toward a known quality level, or kept aligned between systems that are both live?

Answers that point to a cutover, errors treated as ordinary work, and people reviewing the results describe a migration.

Answers that point to no defined end, errors treated as incidents, and unattended operation describe integration.

One piece of work can hold both, and that is fine. A staged migration with one-directional interim synchronization that ends at cutover is still a migration. A continuous two-way feed between live systems is integration. The value is in naming each part for what it is, rather than letting one word cover both.

Summary

Migration is a long-running project that ends at cutover. Data errors are par for the course and corrected as part of the work. Hopp is built for this and excels at it.
Integration is the continuous alignment of live systems. It has no end, and its failures are operational incidents. Dedicated integration and ETL tools are the right fit for this.
Staged migrations need interim synchronization. Few migrations are staged. Most run to a single cutover, and staged ones usually partition the data so that no live synchronization is needed.
Where one-directional interim synchronization is genuinely needed, it is part of the migration, and Hopp supports it.
Two-way synchronization has to settle clashing changes, stop updates echoing between systems, and reconcile records that exist on only one side. A Map does not express any of that. It is integration work, and an integration tool is the right choice for it.

Migration is a project with an end​

Integration keeps live systems aligned​

Why the distinction matters for Hopp​

Staged Migrations​

Why there are not many​

The ones that remain​

Two-way synchronization​

Telling the two apart​

Summary​