Multi-Engine Data Processing and Federated Workflows with Apache Wayang

Organizations increasingly have data distributed across multiple systems: databases, data lakes, streams, warehouses and external services. Traditional pipelines require separate code for each engine, creating fragmentation and duplicated logic.

Project Objective

Provide a unified way to define data-flow logic once, and allow the execution engine to decide where and how to run it across multiple platforms.

Technology: Apache Wayang®

Apache Wayang is a cross-platform processing engine.
A user writes a pipeline once; Wayang:

analyzes the logical plan
chooses the optimal execution engine(s)
executes across supported backends (databases, Spark, Flink etc.)
maintains consistency and abstraction

Site: https://wayang.apache.org/
Blog article: https://www.novatechflow.com/2025/06/what-are-performance-implications-of.html

Case Study: Federated Multi-Engine Execution

A data flow may need to combine:

relational data in PostgreSQL
streaming data from Kafka
analytical queries on a Spark or Flink backend
data-lake writes into Iceberg

Traditionally, this requires multiple pipelines. With Wayang, a single logical plan can be executed across all of them.

Implementation Notes

Logical plan translated into WayangPlan.
Optimizer selects execution backends based on cost, data size and operator characteristics.
Runtime dispatches tasks to the appropriate engine.
Provides measurable gains when data lives across multiple platforms.

Benefits

Single code path across heterogeneous systems.
Better performance through cross-engine optimization.
Reduced duplication of data-flow code.
Natural integration for federated or distributed datasets.

Looking to build something similar?

→ See my Services

→ Book a call

→ Contact me

If you need help with distributed systems, backend engineering, or data platforms, check my Services.

novatechflow | Alexander Alten

Search This Blog