Early Hadoop installations often struggled with slow-loading JobHistory pages because history files accumulated for weeks on busy clusters. This article explains how JobHistory retention worked before Hadoop 0.21, how the maxage setting introduced fixable cleanup behavior, and how administrators could safely tune or automate log pruning to keep job tracking responsive and storage under control. In early Hadoop versions, administrators frequently noticed that the MapReduce JobHistory UI ( /jobhistory.jsp ) loaded very slowly on high-traffic clusters. The root cause was simple: the JobTracker kept far too many history files, sometimes accumulating tens of gigabytes of metadata that had to be parsed when rendering the page. Why JobHistory Became Slow on Pre-Hadoop 0.21 Clusters Before Hadoop 0.21, the retention policy for jobhistory logs was hardcoded to 30 days . On active clusters this produced enormous history directories—20 GB or more was common. With such volume...
Fractional Chief Architect for Data & Distributed Systems