Background
Organizations increasingly have data distributed across multiple systems: databases, data lakes, streams, warehouses and external services.
Traditional pipelines require separate code for each engine, creating fragmentation and duplicated logic.
Objective
Provide a unified way to define data-flow logic once, and allow the execution engine to decide where and how to run it across multiple platforms.
Technology: Apache Wayang
Apache Wayang is a cross-platform processing engine.
A user writes a pipeline once; Wayang:
- analyzes the logical plan
- chooses the optimal execution engine(s)
- executes across supported backends (databases, Spark, Flink etc.)
- maintains consistency and abstraction
Site:
https://wayang.apache.org/
Blog article:
https://www.novatechflow.com/2025/06/what-are-performance-implications-of.html
Case Study: Federated Multi-Engine Execution
A data flow may need to combine:
- relational data in PostgreSQL
- streaming data from Kafka
- analytical queries on a Spark or Flink backend
- data-lake writes into Iceberg
Traditionally, this requires multiple pipelines. With Wayang, a single logical plan can be executed across all of them.
Implementation Notes
- Logical plan translated into WayangPlan.
- Optimizer selects execution backends based on cost, data size and operator characteristics.
- Runtime dispatches tasks to the appropriate engine.
- Provides measurable gains when data lives across multiple platforms.
Benefits
- Single code path across heterogeneous systems.
- Better performance through cross-engine optimization.
- Reduced duplication of data-flow code.
- Natural integration for federated or distributed datasets.
Looking to build something similar?
If you need help with distributed systems, backend engineering, or data platforms, check my Services.