Skip to main content

Apache Ozone Best Practices at Didi: Scaling to Tens of Billions of Files

· 5 min read
Kaun-Hung (Rich) Huang
Apache Ozone Contributor
The Apache Ozone Community
Apache Ozone Project
Shilun Fan
Apache Ozone Contributor
Hongbing Wang
Apache Ozone Contributor
JiangHua Zhu
Apache Ozone Contributor
Ming Wei
Apache Ozone Contributor

Guest post by the Didi Engineering Team. For the full story with detailed slides, see Apache Ozone Best Practices at Didi (PDF).

As Didi's volume of unstructured data surged into the hundreds of petabytes, comprising tens of billions of files, their traditional storage architecture faced severe scalability bottlenecks. This post summarizes how they migrated from HDFS to Apache Ozone, the optimizations they implemented for high-performance reads, and their journey in contributing these improvements back to the community.

No More Hotspots: Introducing the Automatic Disk Balancer in Apache Ozone

· 5 min read
The Apache Ozone Community
Apache Ozone Project
Wei-Chiu Chuang
Apache Ozone PMC
Yu-Chen Lai
Apache Ozone Contributor
Gargi Jaiswal
Apache Ozone Contributor
Sammi Chen
Apache Ozone Contributor

Ever replaced a drive on a Datanode only to watch it become an I/O hotspot? Or seen one disk hit 95% usage while others on the same machine sit idle? These imbalances create performance bottlenecks and increase failure risk. Apache Ozone's new intra-node Disk Balancer is designed to fix this—automatically.