Version: Next

Integrating Ozone Storage With Other Systems

This section provides instructions to configure other systems to use Ozone for storage.

📄️ Hive

Apache Hive has supported Apache Ozone since Hive 4.0. To enable Hive to work with Ozone paths, ensure that the ozone-filesystem-hadoop3 JAR is added to the Hive classpath.

Apache Iceberg is an open table format for huge analytic datasets. It is designed to improve on the limitations of traditional table formats like Hive and provides features such as schema evolution, hidden partitioning, and time travel.

📄️ Impala

Starting with version 4.2.0, Apache Impala provides full support for querying data stored in Apache Ozone. To utilize this functionality, ensure that your Ozone version is 1.4.0 or later.

📄️ Spark

Apache Spark is a widely used unified analytics engine for large-scale data processing. Ozone can serve as a scalable storage layer for Spark applications, allowing you to read and write data directly from/to Ozone clusters using familiar Spark APIs.

📄️ DistCp

Hadoop DistCp is a command-line, MapReduce-based tool for bulk data copying.

📄️ Flink

Apache Flink is a powerful, open-source distributed processing framework designed for stateful computations over both bounded and unbounded data streams at any scale. It enables high-throughput, low-latency, and fault-tolerant processing while offering elastic scaling capabilities to handle millions of events per second across thousands of cores.

📄️ HBase

Apache Ozone supports integration with Apache HBase, allowing you to use Ozone as the underlying storage layer for HBase tables. This integration leverages the ofs:// scheme to provide a scalable and robust filesystem for HBase Region Servers.