Portfolio

NovaTechFlow - Deep Tech, Data and AI Consulting

This page lists selected projects and systems I have built or contributed to. Each entry includes a short, factual description and links to source code or live documentation where available.

KafScale (creator)

A Kafka-protocol-compatible streaming platform built for S3-based durable message transport.

Repository: github.com/KafScale
Website: kafscale.io
Architecture rationale: novatechflow.com/p/kafscale.html

KafScale targets the 80% of Kafka workloads that function as durable pipes—producers write, consumers read, teams rely on replay—without requiring sub-millisecond latency, exactly-once transactions, or compacted topics.

The architecture separates concerns cleanly: stateless broker pods handle Kafka protocol traffic, S3 stores immutable log segments (11 nines durability), and etcd manages metadata, offsets, and consumer group state. Brokers are ephemeral compute; data remains durable externally.

Written in Go with a Kubernetes-native operator. Supports 21 Kafka APIs including Produce, Fetch, Metadata, and full consumer group coordination. Apache 2.0 licensed.

Stack: Go, gRPC, Protocol Buffers, S3, etcd, Kubernetes

Lucendex

A neutral, non-custodial execution layer for XRPL settlement.

Repository: github.com/novatechflow/lucendex
Website: lucendex.com

Lucendex is a non-custodial, deterministic routing engine for the XRPL decentralized exchange. It indexes AMM pools and orderbook data, evaluates available paths, and produces quotes using a deterministic QuoteHash mechanism.

The service uses PostgreSQL and PL/pgSQL for indexing and routing logic, and provides Ed25519-authenticated API access.

Stack: PostgreSQL, PL/pgSQL, Ed25519 authentication

docAI Toolkit

A Python toolkit for document analysis workflows.

Repository: github.com/novatechflow/docai
PyPI: pypi.org/project/docai-toolkit/

Provides utilities for loading documents, splitting and preprocessing text, integrating embeddings or ML-based processing, and preparing inputs for AI pipelines or further downstream processing.

Available as a published package on PyPI.

Stack: Python

kaf-s3 Connector

A Kafka-to-S3 connector implemented in Python.

Repository: github.com/novatechflow/kaf-s3
PyPI: pypi.org/project/kaf-s3-connector/
Case study: Kafka-to-S3 Connector: Large Message Offloading and Scalable ETL

Consumes records from Kafka topics, batches them, and writes them to S3 (or compatible object storage) using configurable batching parameters and storage formats.

Published as a package on PyPI.

Stack: Python, Apache Kafka, S3

Scalytics-Federated / Schema→Iceberg Application

Internal system combining data ingestion, schema normalization and processing pipelines to produce "AI-ready" data views for analytics or ML.

Organization: scalytics.io
Case study: Apache Wayang Federated Multi-Engine Processing Case Study

Architecture integrates data from arbitrary source systems or message topics, normalizes schema, and writes unified results into an Iceberg-based data lakehouse. Processing is performed via Apache Flink for streaming and batch workload support.

Stack: Apache Flink, Apache Iceberg, Apache Wayang, data lakehouse architecture

Apache Wayang (PMC member and committer)

Project: Apache Wayang
Apache committer profile: wayang.apache.org/docs/community/team — listed as PMC, Committer (Apache ID: aloalt)

What Apache Wayang is:

Wayang is a unified data processing framework that allows developers to write data workflows in a platform-agnostic way. It translates logical plans into an intermediate representation (WayangPlan), then optimizes them and executes them across one or more processing engines—relational databases, batch engines, stream engines—without requiring users to write engine-specific code.

Why Wayang matters:

Enables cross-platform execution: same data flow code can run on different engines (PostgreSQL, Spark, Flink, etc.) depending on workload and environment.
Provides cost and performance optimization: its optimizer selects the most efficient execution plan across platforms.
Supports federated or distributed data scenarios and heterogeneous data infrastructures, useful when data lives in multiple, different storage or processing systems.

My contributions:

I contribute as a committer and PMC member. Work includes development, architecture and integration of Wayang, bridging its core engine with data-processing pipelines (batch/stream), data ingestion integrations, and use cases for data-lake / AI-ready datasets.

Detailed write-up: What are performance implications of distributed data processing across multiple engines

Stack: Java, Scala, Apache Spark, Apache Flink, PostgreSQL, cross-platform optimization

If you need help with distributed systems, backend engineering, or data platforms, check my Services.

novatechflow | Alexander Alten

Search This Blog

Portfolio

KafScale (creator)

Lucendex

docAI Toolkit

kaf-s3 Connector

Scalytics-Federated / Schema→Iceberg Application

Apache Wayang (PMC member and committer)

Most read articles

Building a Model-Agnostic Multi-Agent System with OpenClaw

Why Is Customer Obsession Disappearing?

What are the performance implications of cross-platform execution within Wayang?