Hive
Apache Hive has supported Apache Ozone since Hive 4.0. To enable Hive to work with Ozone paths, ensure that the ozone-filesystem-hadoop3
JAR is added to the Hive classpath.
Supported Access Protocols
Hive supports the following protocols for accessing Ozone data:
- ofs
- o3fs
- s3a
Supported Replication Types
Hive is compatible with Ozone buckets configured with either:
- RATIS (Replication)
- Erasure Coding
Accessing Ozone Data in Hive
Hive provides two methods to interact with data in Ozone:
- Managed Tables
- External Tables
Managed Tables
Configuring the Hive Warehouse Directory in Ozone
To store managed tables in Ozone, update the following properties in the hive-site.xml
configuration file:
<property>
<name>hive.metastore.warehouse.dir</name>
<value>ofs://ozone1/vol1/bucket1/warehouse/</value>
</property>
Creating a Managed Table
You can create a managed table with a standard CREATE TABLE
statement:
CREATE TABLE myTable (
id INT,
name STRING
);
Loading Data into a Managed Table
Data can be loaded into a Hive table from an Ozone location:
LOAD DATA INPATH 'ofs://ozone1/vol1/bucket1/table.csv' INTO TABLE myTable;
Specifying a Custom Ozone Path
You can define a custom Ozone path for a database using the MANAGEDLOCATION
clause:
CREATE DATABASE d1 MANAGEDLOCATION 'ofs://ozone1/vol1/bucket1/data';
Tables created in the database d1 will be stored under the specified path:
ofs://ozone1/vol1/bucket1/data
Verifying the Ozone Path
You can confirm that Hive references the correct Ozone path using:
SHOW CREATE DATABASE d1;
Output Example:
+----------------------------------------------------+
| createdb_stmt |
+----------------------------------------------------+
| CREATE DATABASE `d1` |
| LOCATION |
| 'ofs://ozone1/vol1/bucket1/external/d1.db' |
| MANAGEDLOCATION |
| 'ofs://ozone1/vol1/bucket1/data' |
+----------------------------------------------------+
External Tables
Hive allows the creation of external tables to query existing data stored in Ozone.
Creating an External Table
CREATE EXTERNAL TABLE external_table (
id INT,
name STRING
)
LOCATION 'ofs://ozone1/vol1/bucket1/table1';
- With external tables, the data is expected to be created and managed by another tool.
- Hive queries the data as-is.
- Note: Dropping an external table in Hive does not delete the associated data.
To set a default path for external tables, configure the following property in the hive-site.xml
file:
<property>
<name>hive.metastore.warehouse.external.dir</name>
<value>ofs://ozone1/vol1/bucket1/external/</value>
</property>
This property specifies the base directory for external tables when no explicit LOCATION
is provided.
Verifying the External Table Path
To confirm the table’s metadata and location, use:
SHOW CREATE TABLE external_table;
Output Example:
+----------------------------------------------------+
| createtab_stmt |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `external_table`( |
| `id` int, |
| `name` string) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'ofs://ozone1/vol1/bucket1/table1' |
| TBLPROPERTIES ( |
| 'bucketing_version'='2', |
| 'transient_lastDdlTime'='1734725573') |
+----------------------------------------------------+
Using the S3A Protocol
In addition to ofs, Hive can access Ozone using the S3 Gateway via the S3A file system.
For more information, consult:
- The S3 Protocol
- The Hadoop S3A documentation.