This page lists selected projects and systems I have built or contributed to. Each entry includes a short, factual description and links to source code or live documentation where available.
Lucendex
A neutral, non-custodial execution layer for XRPL trading.
Repository: https://github.com/2pk03/lucendex
Website: https://lucendex.com
Lucendex is a non-custodial, deterministic routing engine for the XRPL decentralized exchange.
It indexes AMM pools and orderbook data, evaluates available paths, and produces quotes using a deterministic QuoteHash mechanism.
The service uses PostgreSQL and PL/pgSQL for indexing and routing logic, and provides an Ed25519-authenticated API access.
docAI Toolkit
A Python toolkit for document analysis workflows.
Repository: https://github.com/2pk03/docai
PyPI: https://pypi.org/project/docai-toolkit/
Provides utilities for loading documents, splitting and preprocessing text, integrating embeddings or ML-based processing, and preparing inputs for AI pipelines or further downstream processing.
Available as a published package on PyPI.
kaf-s3 Connector
A Kafka-to-S3 connector implemented in Python.
Repository: https://github.com/2pk03/kaf-s3
PyPI: https://pypi.org/project/kaf-s3-connector/
Consumes records from Kafka topics, batches them, and writes them to S3 (or compatible object storage) using configurable batching parameters and storage formats.
Published as a package on PyPI.
Scalytics-Federated / Schema→Iceberg Application
Internal system combining data ingestion, schema normalization and processing pipelines to produce “AI-ready” data views for analytics or ML.
Architecture integrates data from arbitrary source systems or message topics, normalizes schema, and writes unified results into an Iceberg-based data lakehouse.
Processing is performed via Apache Flink for streaming / batch workload support.
Apache Wayang (as committer and member of the PMC)
Project: Apache Wayang (incubating) — https://wayang.apache.org/
What Apache Wayang is:
Wayang is a unified data processing framework that allows developers to write data workflows in a platform-agnostic way. It translates logical plans into an intermediate representation (WayangPlan), then optimizes them and executes them across one or more processing engines — e.g. relational databases, batch engines, stream engines — without requiring users to write engine-specific code.
Why Wayang matters:
Enables cross-platform execution: same data flow code can run on different engines (PostgreSQL, Spark, Flink, etc.) depending on workload and environment.
Provides cost and performance optimization: its optimizer selects the most efficient execution plan across platforms.
Supports federated or distributed data scenarios and heterogeneous data infrastructures, useful when data lives in multiple, different storage or processing systems.
My contributions / involvement:
I contribute as a committer and PMC member of Wayang. I have worked on development, architecture and integration of Wayang, bridging its core engine with data-processing pipelines (batch/stream), data ingestion integrations, and use cases for data-lake / AI-ready datasets.
You can find a detailed write-up of my experience and discussion of performance implications in my blog post: What are performance implications of distributed data processing across multiple engines
If you need help with distributed systems, backend engineering, or data platforms, check my Services.