Back to home

case study

Sync Bridge

ETL
Backend
Node.js
Koa
Datadog
CronJob
Transactions
Conflict resolution

Built a bi-directional ETL bridge that keeps a legacy monolith and a modern full-stack JavaScript micro-services platform consistent during a long deprecation period.

Sync Bridge cover

Overview & Problem

The platform ecosystem consists of a legacy monolith and a new full-stack JavaScript platform built as micro-services.

Both systems are temporarily running in parallel in production until we deprecate the legacy monolith. Without a sync mechanism, data between the systems would quickly become inconsistent, leading to operational issues and increased maintenance overhead.

Solution

I built Sync Bridge, an ETL system in Node.js, using Koa, that keeps both systems in sync automatically. It handles bi-directional data flow, ensures transactional integrity, and leverages Datadog observability for production reliability.

Sync Bridge runs on a scheduled CronJob that checks for data changes every minute using the updated_at field in both databases to determine the most recent change.

System Architecture

Sync Bridge (ETL + Koa) system architecture diagram

Technical Challenges & Solutions

Code Snippets

// Cursor-based paginator for customer data. Each page is transformed concurrently.
// Flow is intentionally staged: identities -> groupings -> geo entities -> reconciliation.
// e.g. for (const stage of stages) await runStage(stage)

type FetchPage<T> = (cursor: string | null, size: number) => Promise<{
  rows: T[];
  nextCursor: string | null;
}>;

async function drainInPages<T, R>(
  fetchPage: FetchPage<T>,
  transform: (row: T) => Promise<R>,
  pageSize = 100,
): Promise<R[]> {
  const output: R[] = [];
  let cursor: string | null = null;

  while (true) {
    const { rows, nextCursor } = await fetchPage(cursor, pageSize);
    output.push(...(await Promise.all(rows.map((row) => transform(row)))));

    if (!nextCursor || rows.length === 0) break;
    cursor = nextCursor;
  }

  return output;
}

Batched, ordered ETL from customer DB into platform services with explicit reconciliation passes.

// Build one CronJob manifest per customer from runtime config.
// History is kept small, runs are single-shot, and the sync entrypoint is executed directly.
function makeCustomerSyncCronJob(config: CustomerConfig) {
  return {
    apiVersion: 'batch/v1',
    kind: 'CronJob',
    metadata: {
      name: \`sync-\${config.customerSlug}\`,
      labels: { /* business, customer, component: sync */ },
    },
    spec: {
      schedule: config.schedule,
      concurrencyPolicy: 'Forbid',
      failedJobsHistoryLimit: 0,
      successfulJobsHistoryLimit: 1,
      jobTemplate: {
        spec: {
          template: {
            spec: {
              restartPolicy: 'OnFailure',
              containers: [
                {
                  name: 'sync-worker',
                  image: config.image,
                  args: ['--require=dd-trace/init', 'dist/job.js'],
                  env: [ /* CUSTOMER_*, DB_*, upstream/downstream hosts */ ],
                },
              ],
            },
          },
        },
      },
    },
  };
}

Kubernetes CronJobs created/updated from Node; sync workload runs under dd-trace.

Impact & Metrics

Skills Demonstrated