METL infrastructure with a modern pipeline is based on Kafka-streams.

ZeitZeit
  • a modern pipeline
  • more than 80 microservices
  • many benefits, also challenges

Our leading analytical system at EOS

We have recently significantly improved the ETL infrastructure of FX with a modern pipeline that is based on Kafka-streams. It extracts data with log-based Change Data Capture with Debezium from more than 80 microservices. In order to integrate the data from the microservices, we have implemented a canonical data model (CDM).

While a CDM offers many benefits, it also presents challenges. A mapping to a CDM requires a parameter for each attribute of every extraction schema. All parameters form the mapping matrix. Since the number of attributes can expand with new schema versions, so does the mapping matrix. We estimate that it can grow to up to 1.000.000.000 elements in the EOS case. The size makes updates and computation challenging.
In our large FX team of skilled engineers, architects, and data scientists, we have found a new solution for this complexity problem.

’METL - a modern ETL pipeline at EOS with a dynamic mapping matrix’

We have described it in a paper.

It has the title ’METL - a modern ETL pipeline at EOS with a dynamic mapping matrix’. In the paper, we present a new type of dynamic mapping matrix, which is based on permutation matrices that are obtained by block-partitioning and pattern generalization. We show that the DMM can be used for automated updates, for parallel computation in near real-time and highly efficient compacting.
The paper