Skip to main content

Posts

Showing posts from May, 2014

Facebook's Presto Evolution into Trino and Starburst

This retrospective revisits the early days of Facebook’s Presto engine in 2014, including installation quirks, missing security features, and benchmark comparisons with Hive and Tez. It explains how Presto’s performance and connector architecture reshaped SQL-on-Hadoop and ultimately led to the creation of Trino, the modern distributed SQL engine used today across large-scale data platforms. From Presto to Trino: A Look Back at the Early Days of Distributed SQL In late 2013, Facebook released Presto as an open-source distributed SQL engine. At the time, Hadoop’s dominant SQL engines—Hive (MapReduce), Hive+Tez, and early Impala—were still bound to batch-oriented execution models. Presto introduced something radically different: a low-latency, MPP-style SQL engine designed for interactive analytics at petabyte scale. This article is a 2025 retrospective based on a hands-on write-up from 2014, preserving early installation notes and benchmark results while reflecting on...

Cloudera Manager fails to upgrade Sqoop2 when parcels are enabled

Cloudera Manager fails to update the generic Sqoop2 connectors when parcels are enabled, and the Sqoop2 server won't start anymore. In the logs a error like: Caused by: org.apache.sqoop.common.SqoopException: JDBCREPO_0026:Upgrade required but not allowed - Connector: generic-jdbc-connector is shown. This issue can be fixed by adding two properties into the service safety valve of sqoop: org.apache.sqoop.connector.autoupgrade=true org.apache.sqoop.framework.autoupgrade=true This happen trough the missing autoupdate of the default sqoop connectors in Cloudera Manager. After the properties are added, SqoopServer should be able to update the drivers and will start sucessfully.

Who Really Led the Hadoop Market? A Look Back at the 2014 Forrester Wave

In 2014 every Hadoop vendor claimed to be the market leader, but the Forrester Wave told a different story: the ecosystem was crowded, overlapping, and full of marketing noise. Looking back from 2025, it’s clear that none of the commercial players won—open source won, and the industry evolved far beyond the Hadoop vendors of that era. In early 2014, Forrester Research published its well-known Forrester Wave: Big Data Hadoop Solutions, Q1 2014 . The report evaluated the major players of that time—Cloudera, Hortonworks, MapR, IBM, Teradata—and declared them all “leaders.” Not surprisingly, each vendor immediately launched a marketing campaign claiming they were the one true leader. From the outside it looked almost comedic: five companies staring at the same chart, each insisting the dot representing them was the real champion. The reality? The Hadoop distribution market was crowded, competitive, and full of overlapping capabilities. Nobody led decisively—and that matters. Th...